Consider the following sequence of instructions:
sub reads regs1 and 3 in cycle 2 and passes them to the ALU.
It is not until the cycle 5 that it writes the answer to the Reg File
and reads regs 2 and 5 in cycle 3 and passes them to the ALU.
Reg 2 has not yet been updated!
When and reads the Reg File it gets the wrong value for Reg 2.
This is called a data hazard.
Data hazards occur if an instruction reads a Register that a previous instruction overwrites in a future cycle.
We must eliminate data hazards or pipelining produces incorrect results.
There are three ways to remove data hazards:
The first way is to give the responsibility to software.
We must delay the and until after reg2 has been written.
This puts the responsibility of removing hazards onto the compiler writer, and involves no extra hardware.
The second way is to implement hardware to detect hazards, if they exist, and to insert a delay in the circuit - a stall.
After we detect a hazard we stall the instruction memory so that no new instructions are read until after the stall is resolved. Detecting the hazard condition involves checking the write register in each of the pipeline registers:
If the write register is equal to either of the two read registers in the IF/ID register, then a stall occurs.
The third way is to notice that although reg 2 does not contain valid information at the time that the next instruction wants to read it, the correct information is available!
It just isn’t in the right place.
The EX/MEM register contains the output from the ALU.
The result that is destined for reg 2 is part of the EX/MEM register.
We can insert the output of the EX/MEM register into the input of the ALU - data forwarding.
This again involves extra hardware.
It removes all of the stalls and nops of the two previous solutions.
Unfortunately it does not work in all cases:
In this case the load instruction doesn’t get the data from memory until the DM cycle finishes
The data is not available until after it is required.
We have to insert a stall.
The original MIPs chip relied on the compiler writer to insert nops - trading the hardware space needed to control stalls.
These days processors must run as fast as possible. The extra hardware for forwarding/stalling is included on the chip.
If the compiler re-orders instructions, hardware stalls for load instructions can be eliminated:
lw $3,(200)$11 ;|
add $4,$2,$3 ;|one stall added here
Two values are copied from memory to regs 2 and 3, and are then added together.
A further value is then copied from memory to reg 5
The second lw into reg 3, involves a stall.
If the compiler reorders the instructions:
add $4,$2,$3 ;$3 is available!
The result is the same - but now the add instruction is able to get the value of reg 3 from the forwarding unit -
The load of reg 5 has inserted on extra stage into the pipeline.
As soon as we branch to a new instruction, all the instructions that are in the pipeline behind the branch become invalid!
+100 sub $7,$8,$9
Either the add or sub instruction after the beq will be executed depending on the contents of regs 2 and 3.
We can include extra hardware to calculate the branch offset in the decode cycle. Data forwarding then makes it possible to do the branch just one cycle later - insert a nop.
A clever compiler can eliminate the effect of the delay by inserting an instruction after the branch!
This can be the previous instruction!
(If it is not involved in the branch.)
can be re-ordered:
Hardware 19 -