复旦大学:《计算机原理 Computer System》课程PPT课件_Pipelined Implementation Part I

Pipelined Implementation Partl
Pipelined Implementation Part I

Overview General Principles of Pipelining ■Goa ■ Difficulties Creating a Pipelined Y86 Processor ■ Rearranging sec a Inserting pipeline registers a Problems with data and control hazards Processor
– 2 – Processor Overview General Principles of Pipelining ◼ Goal ◼ Difficulties Creating a Pipelined Y86 Processor ◼ Rearranging SEQ ◼ Inserting pipeline registers ◼ Problems with data and control hazards

Suggested Reading -Chap435,4.434.5 Processor
– 3 – Processor Suggested Reading - Chap 4.3.5, 4.4, 4.5

SEQ Hardware (Review) Stages occur in sequence One operation in process memory at a time Fiqure 4.21 P293 Execute AL Write back Fetch memory increment Processor
– 4 – Processor SEQ Hardware (Review) ◼ Stages occur in sequence ◼ One operation in process at a time Instruction memory PC increment CC ALU Data memory New PC rB dstE dstM ALU A ALU B Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out Register file A B M E Register file A B M E Bch dstE dstM srcA srcB icode ifun rA PC valC valP valA valB Data valE valM PC newPC Figure 4.21 P293

SEQ+ Hardware Memory ■ Still sequential implementation Reorder PC stage to put at beginning PC Stage Task is to select Pc for current instruction Decode ■ Based on resu|ts computed by previous rite bac instruction Processor State Fetch increment a Pc is no longer stored in register But, can determine Pc based on other stored formation 5 Processor
– 5 – Processor Instruction memory PC increment CC ALU Data memory PC rB dstE dstM ALUA ALUB Mem. control Addr srcA srcB read write ALU fun. Fetch Decode Execute Memory Write back data out Register file A B ME Register file A B ME Bch dstE dstM srcA srcB icode ifun rA pIcode pBch pValM pValC pValP PC valC valP valA valB Data valE valM PC SEQ+ Hardware ◼ Still sequential implementation ◼ Reorder PC stage to put at beginning PC Stage ◼ Task is to select PC for current instruction ◼ Based on results computed by previous instruction Processor State ◼ PC is no longer stored in register ◼ But, can determine PC based on other stored information

Problem of sEQ and seQ+ Too slow Too many tasks needed to finish in one clock cycle a Signals need long time to propagate through all of the stages a The clock must run slowly enough Does not make good use of hardware units Every unit is active for part of the total clock cycle Processor
– 6 – Processor Problem of SEQ and SEQ+ Too slow ◼ Too many tasks needed to finish in one clock cycle ◼ Signals need long time to propagate through all of the stages ◼ The clock must run slowly enough Does not make good use of hardware units ◼ Every unit is active for part of the total clock cycle

Real-World Pipelines: Car Washes Sequential Parallel Pipelined ldea Divide process into independent stages a Move objects through stages In sequence a At any given times, multiple objects being processed Processor
– 7 – Processor Real-World Pipelines: Car Washes Idea ◼ Divide process into independent stages ◼ Move objects through stages in sequence ◼ At any given times, multiple objects being processed Sequential Parallel Pipelined

Computational Example Figure 4.32 P310 300ps 20 ps R Combinational Delay 320 ps logic Throughput =3. 12 GOPS g Clock System a Computation requires total of 300 picoseconds a Additional 20 picoseconds to save result in register a Can must have clock cycle of at least 320 ps 8 Processor
– 8 – Processor Computational Example System ◼ Computation requires total of 300 picoseconds ◼ Additional 20 picoseconds to save result in register ◼ Can must have clock cycle of at least 320 ps Combinational logic R e g 300 ps 20 ps Clock Delay = 320 ps Throughput = 3.12 GOPS Figure 4.32 P310

3-Way Pipelined Version Figure 4.33 A)P310 100ps 20 ps 100ps 20 ps 100ps 20 ps Comb Comb R Comb R logic e logic Delay 360 ps gIc e A g B C Throughput =8.33 GOP g g Clock System a Divide com binational logic into 3 blocks of 100 ps each a Can begin new operation as soon as previous one passes through stage A e Begin new operation every 120 ps Overall latency increases o 360 ps from start to finish -9 Processor
– 9 – Processor 3-Way Pipelined Version System ◼ Divide combinational logic into 3 blocks of 100 ps each ◼ Can begin new operation as soon as previous one passes through stage A. ⚫ Begin new operation every 120 ps ◼ Overall latency increases ⚫ 360 ps from start to finish R e g Clock Comb. logic A R e g Comb. logic B R e g Comb. logic C 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Delay = 360 ps Throughput = 8.33 GOPS Figure 4.33 A) P310

Pipeline Diagrams Figure 4.33 B)P310 Unpipelined OP1 OP2 OP3 Time a Cannot start new operation until previous one com pletes 3-Way Pipelined OPlA OP2 BA CBA OP3 B C Time Up to 3 operations in process simultaneously Processor
– 10 – Processor Pipeline Diagrams Unpipelined ◼ Cannot start new operation until previous one completes 3-Way Pipelined ◼ Up to 3 operations in process simultaneously Time OP1 OP2 OP3 Time A B C A B C A B C OP1 OP2 OP3 Figure 4.33 B) P310
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 复旦大学:《计算机原理 Computer System》课程PPT课件_09、10 Sequential CPU Implementation.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Processor Architecture.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Heterogeneous Data Structures & Alignment; Putting it Together; Floating Point.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Procedure Call and Array.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Machine-Level Representation of Programs Ⅱ.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Machine-Level Representation of Programs I.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Integer Operations; Floating Points.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Integer Representations.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Introduction to Computer Systems; Information is Bits+Context; Information Storage.ppt
- 复旦大学:《计算机原理 Computer System》课程资源_2006年期中考试题目.doc
- 复旦大学:《计算机原理 Computer System》课程资源_2006年期中考试答案.doc
- 复旦大学:《计算机原理 Computer System》课程资源_教学大纲.pdf
- 复旦大学:《计算机图形学》课后习题答案_7.docx
- 复旦大学:《计算机图形学》课后习题答案_6.docx
- 复旦大学:《计算机图形学》课后习题答案_5.docx
- 复旦大学:《计算机图形学》课后习题答案_4.docx
- 复旦大学:《计算机图形学》课后习题答案_3.docx
- 复旦大学:《计算机图形学》课后习题答案_2.docx
- 复旦大学:《计算机图形学》课后习题答案_1.docx
- 复旦大学:《电子商务》课程PPT课件_第十次课 社会化电子商务——Hold住社会化媒体营销.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Pipelined Implementation Part II.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_12b Code Optimization(• Machine-Independent Optimization – Code motion – Memory optimization • Suggested reading).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_13 Code Optimization(• Optimizing Blockers • Understanding Modern Processor • More Code Optimization techniques • Performance Tuning).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Hardware Organization.ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Memory Hierarchy(• Random-Access Memory(RAM)• Nonvolatile Memory • Disk Storage • Locality • Memory hierarchy).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Cache Memory(• General concepts • 3 ways to organize cache memory • Issues with writes • Write cache friendly codes • Cache mountain).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Cache Memory(• Cache mountain • Matrix multiplication).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Virtual Memory(• Virtual Space• Address translation • Accelerating translation• Different points of view).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Virtual Memory(• Multilevel page tables • Different points of view • Pentium/Linux Memory System • Memory Mapping).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Dynamic Memory Allocation(• Implementation of a simple allocator • Explicit Free List • Segregated Free List).ppt
- 复旦大学:《计算机原理 Computer System》课程PPT课件_Linking II(• Static linking • Symbols & Symbol Table • Relocation • Executable Object Files • Loading).ppt
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter2.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_Chapter 3 Machine-Level Representation of Programs.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_Chapter 3 Machine-Level Representation of Programs.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_Chapter 3 Machine-Level(2)Representation of Programs.ppt
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter4 Processor Architecture.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter5 Optimizing Program Performance.pptx
- 复旦大学:《计算机原理 Computer System》习题PPT课件_chapter6 The Memory Hierarchy.ppt
- 复旦大学:《计算机网络与网页制作》课程教学大纲 Computer Network and Webpage Design.pdf
- 《当代教育理论与实践》论文:大学计算机基础教学实践与思考(复旦大学:肖川、张向东).pdf