复旦大学:《计算机原理 Computer System》课程PPT课件_Pipelined Implementation Part II

P relined Implementation Part ll
Pipelined Implementation Part II

Overview Make the pipelined processor work! Data Hazards a Instruction having register R as source follows shortly after instruction having register R as destination a Common condition, don't want to slow down pipeline Control Hazards a Mispredict conditional branch o Our design predicts all branches as being taken o Naive pipeline executes two extra instructions a Getting return address for ret instruction O PIPE-executes three extra instructions Making Sure It Really Works a What if multiple special cases happen simultaneously? Processor
– 2 – Processor Overview Make the pipelined processor work! Data Hazards ◼ Instruction having register R as source follows shortly after instruction having register R as destination ◼ Common condition, don’t want to slow down pipeline Control Hazards ◼ Mispredict conditional branch ⚫ Our design predicts all branches as being taken ⚫ Naïve pipeline executes two extra instructions ◼ Getting return address for ret instruction ⚫ PIPE- executes three extra instructions Making Sure It Really Works ◼ What if multiple special cases happen simultaneously?

Suggested Reading Chap 4.5 Processor
– 3 – Processor Suggested Reading - Chap 4.5

Branch Misprediction Example demo-3y 0x000: xox1号eax,8eax 0x002: 刀】 ne七 i Not taken 0x007: irm。v1$1,8eax Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: ha七 0x011: t: irmovI $3, edx Target (Should not execute 0x017: 工mov14,岩ecx i Should not execute 0x01a: irmovl $5, edx i Should not execute a Should only execute first 7 instructions Processor
– 4 – Processor Branch Misprediction Example ◼ Should only execute first 7 instructions 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute demo-j.ys

Branch Misprediction Trace #f demo-1 2 34567 0300039268,9 eax FDEMw 0x002: Gne t Not taken FDEM W 0x01l: t: irmovl $3, edx Target FDEM 0x017: irmovl $4, %ecx Target+1 FDEMIW 0x007 irmovl $l, eax Fall Through FDEMWI Cycle 5 M a Incorrectly execute two M Bch=0 instructions at branch target M valA= 0x007 vaE←3 dstE=号edx D vaIC 4 dstE= ecx 5 rB←各eax Processor
– 5 – Processor Branch Misprediction Trace 0x000: xorl %eax,%eax 1 2 3 4 5 6 7 8 9 F D E M 0x002: jne t # Not taken F D E M W W 0x011: t: irmovl $3, %edx # Target F D E M W 0x017: irmovl $4, %ecx # Target+1 F D E M W 0x007: irmovl $1, %eax # Fall Through F D E M W # demo-j F D E M W Cycle 5 E valE 3 dstE = %edx E valE 3 dstE = %edx M M_Bch = 0 M_valA = 0x007 D valC = 4 dstE = %ecx D valC = 4 dstE = %ecx F valC 1 rB %eax F valC 1 rB %eax ◼ Incorrectly execute two instructions at branch target

Return Example demo -retys 0x000: irmovl Stack, esp Intialize stack pointer 0x006 nop Avoid hazard on esp 0x007: nop 0x008: nop 0x009 callp Procedure call 0x00e irmovl $5, esi Return point 0x014: ha1七 0x020:.pos0x20 Ox020: p:nop #pr。 cedure 0x021: nop 0x022: nop 0x023: re七 0x024: irmoⅴ1$1,号eax i Should not be executed 0x02a: irmovl $2, ecx Should not be executed 0x030: irmovl $3, edx Should not be executed 0x036: 1 Mov1$4,号ebx i Should not be executed 0x100:.pos0x100 0x100: Stack Stack: Stack pointer a Require lots of nops to avoid data hazards Processor
– 6 – Processor 0x000: irmovl Stack,%esp # Intialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: nop # procedure 0x021: nop 0x022: nop 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer Return Example ◼ Require lots of nops to avoid data hazards demo-ret.ys

Incorrect Return Example ret 0x023 ret FDEMW 0x024 irmovl $1,,, eax Oops! FDEM V 0x02 FDEMV 0x030: i工mov1$3,号edx#Oops FDEM 0x00e irmovl $5, esi Return FDEM W a Incorrectly execute 3 instructions following ret W vaIM= 0x0e valE =1 dstE=号 valE dstE=告e D dstE= edx va|C←5 rB←各 Processe
– 7 – Processor Incorrect Return Example 0x023: ret F D E M 0x024: irmovl $1,%eax # Oops! F D E M W W 0x02a: irmovl $2,%ecx # Oops! F D E M W 0x030: irmovl $3,%edx # Oops! F D E M W 0x00e: irmovl $5,%esi # Return F D E M W # demo-ret F D E M W E valE 2 dstE = %ecx M valE = 1 dstE = %eax D valC = 3 dstE = %edx F valC 5 rB %esi W valM = 0x0e 0x023: ret F D E M 0x024: irmovl $1,%eax # Oops! F D E M W W 0x02a: irmovl $2,%ecx # Oops! F D E M W 0x030: irmovl $3,%edx # Oops! F D E M W 0x00e: irmovl $5,%esi # Return F D E M W # demo-ret F D E M W E valE 2 dstE = %ecx E valE 2 dstE = %ecx M valE = 1 dstE = %eax M valE = 1 dstE = %eax D valC = 3 dstE = %edx D valC = 3 dstE = %edx F valC 5 rB %esi F valC 5 rB %esi W valM = 0x0e ◼ Incorrectly execute 3 instructions following ret

Handling Misprediction 123456 10 s 0x000: xor1 geax %eax FDEM 0x002 jne target Not taken FDEMW 0x01l: t: irmovl $2, edx Target FD bubble EM W 0x017: irmovl $3, %ebx Target+1 F bubble DEMW 0x007: irmovl $l, eax Fall through FDEMW 0x00d: nop TFDTEMWI Predict branch as taken Figure 4.63 P346 a Fetch 2 instructions at target Cancel when mispredicted a Detect branch not-taken in execute stage a On following cycle, replace instructions in execute and decode by bubbles a No side effects have occurred yet 8 Processor
– 8 – Processor Handling Misprediction Predict branch as taken ◼ Fetch 2 instructions at target Cancel when mispredicted ◼ Detect branch not-taken in execute stage ◼ On following cycle, replace instructions in execute and decode by bubbles ◼ No side effects have occurred yet 0x000: xorl %eax,%eax 1 2 3 4 5 6 7 8 9 F D E M W 0x002: jne target # Not taken F D E M W E M W 10 # demo-j.ys 0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+1 F D E M W D F bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop F D E M W F D E M W Figure 4.63 P346

Detecting Mispredicted Branch valA dstM ALU Figure 4.64 P347 Condition Trigger Mispredicted Branch E icode =lJXX& le Bch -9 Processor
– 9 – Processor Detecting Mispredicted Branch Condition Trigger Mispredicted Branch E_icode = IJXX & !e_Bch M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE M F D Instruction memory PC increment Register file CC ALU Data memory Select PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out data in A B M E M_valA W _valE W _valM W _valE M_valA W _valM f_PC Predict PC icode Bch valE valA dstE dstM E icode ifun valC valA valB dstE dstM srcA srcB icode ifun rA valC valP predPC d_srcA d_srcB e_Bch M_Bch Sel+Fwd A Fwd B W icode valE valM dstE dstM m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE m_valM W _valM M_valE e_valE Figure 4.64 P347

Control for Misprediction demo -i y 2 3456 8910 0x000 xor Seax, Seax FDE MW 0x002: jne t# Not taken FDE MW 0x0l1: t: irmovl $2,% edx Target F D bubble E M W 0x017 irmovl $3, %ebx Target+1 F bubble D E M W 0x007: irmovl $l, geax Fall through FD E M W 0x00d nop I EMW Figure 4.63 P346 Condition F D E M W Mispredicted Branch normal bubblebubble normal normal Figure 4.66 P348 Processor
– 10 – Processor Control for Misprediction 0x000: xorl %eax,%eax 1 2 3 4 5 6 7 8 9 F D E M W 0x002: jne t # Not taken F D E M W E M W 10 # demo-j.ys 0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+1 F D E M W D F bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop F D E M W F D E M W Condition F D E M W Mispredicted Branch normal bubble bubble normal normal Figure 4.63 P346 Figure 4.66 P348
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Pipelined Implementation Part I.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_09、10 Sequential CPU Implementation.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Processor Architecture.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Heterogeneous Data Structures & Alignment; Putting it Together; Floating Point.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Procedure Call and Array.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Machine-Level Representation of Programs Ⅱ.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Machine-Level Representation of Programs I.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Integer Operations; Floating Points.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Integer Representations.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Introduction to Computer Systems; Information is Bits+Context; Information Storage.ppt
- 复旦大学:《计算机原理 Computer System》课程资源_2006年期中考试题目.doc
- 复旦大学:《计算机原理 Computer System》课程资源_2006年期中考试答案.doc
- 复旦大学:《计算机原理 Computer System》课程资源_教学大纲.pdf
- 复旦大学:《计算机图形学》课后习题答案_7.docx
- 复旦大学:《计算机图形学》课后习题答案_6.docx
- 复旦大学:《计算机图形学》课后习题答案_5.docx
- 复旦大学:《计算机图形学》课后习题答案_4.docx
- 复旦大学:《计算机图形学》课后习题答案_3.docx
- 复旦大学:《计算机图形学》课后习题答案_2.docx
- 复旦大学:《计算机图形学》课后习题答案_1.docx
- 复旦大学:《计算机原理 Computer System》课程PPT课件_12b Code Optimization(• Machine-Independent Optimization – Code motion – Memory optimization • Suggested reading).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_13 Code Optimization(• Optimizing Blockers • Understanding Modern Processor • More Code Optimization techniques • Performance Tuning).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Hardware Organization.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Memory Hierarchy(• Random-Access Memory(RAM)• Nonvolatile Memory • Disk Storage • Locality • Memory hierarchy).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Cache Memory(• General concepts • 3 ways to organize cache memory • Issues with writes • Write cache friendly codes • Cache mountain).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Cache Memory(• Cache mountain • Matrix multiplication).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Virtual Memory(• Virtual Space• Address translation • Accelerating translation• Different points of view).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Virtual Memory(• Multilevel page tables • Different points of view • Pentium/Linux Memory System • Memory Mapping).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Dynamic Memory Allocation(• Implementation of a simple allocator • Explicit Free List • Segregated Free List).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Linking II(• Static linking • Symbols & Symbol Table • Relocation • Executable Object Files • Loading).ppt
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter2.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_Chapter 3 Machine-Level Representation of Programs.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_Chapter 3 Machine-Level Representation of Programs.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_Chapter 3 Machine-Level(2)Representation of Programs.ppt
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter4 Processor Architecture.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter5 Optimizing Program Performance.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter6 The Memory Hierarchy.ppt
- 复旦大学:《计算机网络与网页制作》课程教学大纲 Computer Network and Webpage Design.pdf
- 《当代教育理论与实践》论文:大学计算机基础教学实践与思考(复旦大学:肖川、张向东).pdf
- 复旦大学:《计算机网络与网页制作》课程PPT教学课件(讲稿)01 计算机网络基础.pptx