Pipeline Hazards Data Hazards Data Hazards Data Hazard

127 downloads 171 Views 525KB Size Report
CSE 240A. Dean Tullsen. Pipeline Hazards ... xor R10, R1, R5. CSE 240A. Dean Tullsen ... Page 2 ... come in 3 flavors (not all of which apply to this pipeline). –.
Data Hazards ADD SUB AND OR XOR

Pipeline Hazards or Danger!Danger!Danger!

CSE 240A

R1, R2, R3 R4, R5, R1 R6, R1, R7 R8, R1, R9 R10, R1, R11

Data ___________ may result in data ____________.

Dean Tullsen

CSE 240A

Data Hazards

Dean Tullsen

Data Hazard sub R7, R6, R3

add R6, R3, R1

addi R3, R1, #35

add R1, R2, R3

sub R4, R1, R6

and R6, R1, R2

or

R8, R11, R1

xor R10, R1, R5

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

Data Dependence

RAW Hazard

• Data hazards are caused by data dependences • Data dependences, and thus data hazards,

• later instruction tries to read an operand before earlier instruction writes it

• The dependence

come in 3 flavors (not all of which apply to this pipeline). – – –

add R1, R2, R3 sub R5, R1, R4

(read-after-write) (write-after-write) (write-after-read)

• The hazard add R1, R2, R3 sub R5, R1, R4

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

• RAW hazard is extremely common CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

WAW Hazard

WAR Hazard

• later instruction tries to write an operand before earlier

• later instruction tries to write an operand before earlier instruction

instruction writes it • The dependence add R1, R2, R3 sub R1, R2, R4 • The hazard ID EX MEM MEM2 MEM3 lw R1, R2, R3 IF IF ID EX MEM WB sub R1, R2, R4 • WAW hazard possible in a reasonable pipeline, but not in the very simple pipeline we’re assuming.

CSE 240A

Dean Tullsen

reads it

• The dependence add R1, R2, R3 sub R2, R5, R4

WB

• The hazard? •

IF ID EX MEM WB add R1, R2, R3 IF ID EX MEM WB sub R2, R5, R4 WAR hazard is uncommon/impossible in a reasonable (in-order) pipeline

CSE 240A

Dean Tullsen

Dealing with Data Hazards through Forwarding

Solutions?

AND R6, R4, R7

SUB R4, R1, R5

ADD R1, R2, R3

add R1, R2, R3

sub R4, R1, R6

and R6, R1, R2

or

R8, R11, R1

xor R10, R1, R5

CSE 240A

Dean Tullsen

CSE 240A

Dealing with Data Hazards through Forwarding

Forwarding Options • ADD -> ADD • ADD -> LW • ADD -> SW (2 operands) • LW -> ADD • LW -> LW • LW -> SW (2 operands)

add R1, R2, R3

sub R4, R1, R6

and R6, R1, R2

or

Dean Tullsen

(I’m letting ADD stand in for all ALU operations)

R8, R11, R1

xor R10, R1, R5

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

More Forwarding

Forwarding in the Pipeline

• (Picture from undergrad text) CSE 240A

Dean Tullsen

CSE 240A

More Forwarding

Dean Tullsen

Forwarding and Stalling

lw

R1, 0(R2)

IF

ID

EX

WB

ID

sub R4, R1, R6

and R6, R1, R7

or

CSE 240A

Dean Tullsen

R8, R1, R9

CSE 240A

Dean Tullsen

Example

Avoiding Pipeline Stalls lw R1, 1000(R2) lw R3, 2000(R2) add R4, R1, R3 lw R1, 3000(R2) add R6, R4, R1 sw R6, 1000(R2)

ADD R1, R2, R3 SW R1, 1000(R2) LW R7, 2000(R2) ADD R5, R7, R1 LW R8, 2004(R2) SW R7, 2008(R8)

• this is a compiler technique called instruction

ADD R8, R8, R2

scheduling.

LW R9, 1012(R8) SW R9, 1016(R8)

How big a problem are these pipeline stalls?

CSE 240A

Dean Tullsen

Detecting ALU Input Hazards

• 13% of the loads in FP programs • 25% of the loads in integer programs

opcode rd rs1 rs2

ID/EX

CSE 240A

Dean Tullsen

CSE 240A

EX/MEM

MEM/WB

ALU

opcode rd

Dean Tullsen

opcode rd

CSE 240A

Dean Tullsen

Inserting Bubbles

Adding Datapaths

• Set all control values in the EX/MEM register to safe

values (equivalent to a nop) • Keep same values in the ID/EX register and IF/ID register • Keep PC from incrementing

CSE 240A

Dean Tullsen

CSE 240A

Control Hazards

Dean Tullsen

Old Datapath

• Result from branch or control ______________ • Instructions are not only dependent on instructions that produce their operands, but also on all previous control flow (branch, jump) instructions that lead to that instruction. add add bne sub and beq branch taken

CSE 240A

and sub

Dean Tullsen

CSE 240A

Dean Tullsen

Branch Hazards

Branch

IF

I2 I3 I4 correct instruction

Branch Stall Impact • If CPI = 1, 30% branch, Stall 3 cycles => new CPI = ???? • Two part solution:

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

– Determine branch taken or not sooner, AND – Compute taken branch address earlier

• (limited MIPS) branch tests if register = 0 or  0 • MIPS Solution:

WB

– Move Zero test to ID/RF stage – Adder to calculate new PC in ID/RF stage – 1 clock cycle penalty for branch versus 3

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

New Datapath

Branch Hazards

Branch

IF

I2 correct instruction I4 I5

CSE 240A

Dean Tullsen

CSE 240A

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

Dean Tullsen

What We Know About Branches • more conditional branches than unconditional • more forward than backward • 67% of branches taken • backward branches taken 80%

CSE 240A

Four Branch Hazard Alternatives

• • • •4 – delayed branch

Dean Tullsen

#1 _______ until branch direction clear

•Problems?

CSE 240A

Dean Tullsen

#2 Predict Branch Not Taken • Execute successor instructions in sequence • “Squash” instructions in pipeline if branch actually taken • Advantage of late pipeline state update • 33% MIPS branches not taken on average • PC+4 already calculated, so use it to get next instruction • This is what the pipeline is doing, anyway

CSE 240A

Dean Tullsen

CSE 240A

Dean Tullsen

Fourth Branch Hazard Alternatives -- Delayed Branch

#3 Predict Branch Taken

• 67% MIPS branches taken on average • But haven’t calculated branch target address in this

– Define branch to take place AFTER a following instruction branch instruction sequential successor1 sequential successor2 ........ sequential successorn branch target if taken

MIPS architecture

– MIPS still incurs 1 cycle branch penalty – Other machines: branch target known before outcome

Branch delay of length n

– 1 slot delay allows proper decision and branch target address in 5 stage pipeline – MIPS uses this

CSE 240A

Dean Tullsen

CSE 240A

Delayed Branch

Key Points

• Where to get instructions to fill branch delay slot? – – – –

Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken Cancelling branches allow more slots to be filled

• Compiler effectiveness for single branch delay slot: – Fills about 60% of branch delay slots – About 80% of instructions executed in branch delay slots useful in computation – About 50% (60% x 80%) of slots usefully filled

CSE 240A

Dean Tullsen

Dean Tullsen

• Hard to keep the pipeline completely full • Data Hazards require dependent instructions to wait for the producer instruction

– Most of the problem handled with forwarding (bypassing) – Sometimes stall still required (especially in modern processors)

• Control hazards require control-dependent (postbranch) instructions to wait for the branch to be resolved • ET = IC * CPI * CT

CSE 240A

Dean Tullsen