编辑: yn灬不离不弃灬 | 2019-07-06 |
2012 Computer Science Cornell University See P&
H Appendix 4.
8 Goals for Today Recap: Data Hazards Control Hazards What is the next instruction to execute if a branch is taken? Not taken? How to resolve control hazards Optimizations MIPS Design Principles Simplicity favors regularity
32 bit instructions Smaller is faster Small register file Make the common case fast Include support for constants Good design demands good compromises Support for different type of interpretations/classes Recall: MIPS instruction formats All MIPS instructions are
32 bits long, has
3 formats R-type I-type J-type op rs rt rd shamt func
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits op rs rt immediate
6 bits
5 bits
5 bits
16 bits op immediate (target address)
6 bits
26 bits Recall: MIPS Instruction Types Arithmetic/Logical R-type: result and two source registers, shift amount I-type: 16-bit immediate with sign/zero extension Memory Access load/store between registers and memory word, half-word and byte operations Control flow conditional branches: pc-relative addresses jumps: fixed offsets, register absolute Recall: MIPS Instruction Types Arithmetic/Logical ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRLV, SRAV, SLTI, SLTIU MULT, DIV, MFLO, MTLO, MFHI, MTHI Memory Access LW, LH, LB, LHU, LBU, LWL, LWR SW, SH, SB, SWL, SWR Control flow BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ J, JR, JAL, JALR, BEQL, BNEL, BLEZL, BGTZL Special LL, SC, SYSCALL, BREAK, SYNC, COPROC extend registerfile control Pipelined Processor alu memory din dout addr PC memory newpc computejump/branchtargets Fetch Decode Execute Memory WB extend registerfile control Pipelined Processor alu memory din dout addr PC memory newpc inst IF/ID ID/EX EX/MEM MEM/WB imm B A ctrl ctrl ctrl B D D M computejump/branchtargets Time Graphs
1 2
3 4
5 6
7 8
9 add nand lw add sw Clock cycle Latency: Throughput: Concurrency: IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Latency:
5 cycles Throughput:
1 instr/cycle Concurrency:
5 CPI =
1 Next Goal What about data dependencies (also known as a data hazard in a pipelined processor)? i.e. add r3, r1, r2 sub r5, r3, r4 Data Hazards Data Hazards register file reads occur in stage
2 (ID) register file writes occur in stage
5 (WB) next instructions may read values about to be written earlier = started earlier = stage right stage left destination reg of earlier instruction == source reg of current Data Hazards Stall Pause current and all subsequent instructions Forward/Bypass Try to steal correct value from elsewhere in pipeline Otherwise, fall back to stalling or require a delay slot Tradeoffs? Data Hazards datamem imm B A B D M D instmem Rd Rd Rb WE WE MC Ra MC detecthazard IF/ID ID/Ex Ex/Mem Mem/WB forwardunit stall = If(IF/ID.Ra ≠
0 &
&
(IF/ID.Ra == ID/Ex.Rd IF/ID.Ra == Ex/M.Rd IF/ID.Ra == M/W.Rd)) Rd Data Hazards datamem imm B A B D M D instmem Rd Rd Rb WE WE MC Ra MC forwardunit detecthazard Three types of forwarding/bypass Forwarding from Ex/Mem registers to Ex stage (M?Ex) Forwarding from Mem/WB register to Ex stage (W ? Ex) RegisterFile Bypass IF/ID ID/Ex Ex/Mem Mem/WB Stalling Pause current and all subsequent instructions slow down the pipeline Stalling Clock cycle
1 2
3 4
5 6
7 8 add r3, r1, r2 sub r5, r3, r5 or r6, r3, r4 add r6, r3, r8 IF ID11=r122=r2 EXD=33 MEMD=33 WB r3=33 IF ID ?=r3 ID ?=r3 ID ?=r3 ID 33=r3 EX MEM WB IF IF IF IF ID 33=r3 EX M IF ID 33=r3 EX time Stalling Clock cycle