《现代计算机体系结构》课程教学课件(留学生版)Lecture 3 Pipelining

Computer ArchitecturePipeliningComputerArchitecture
Computer Architecture Computer Architecture Pipelining

Can We Do Better?: What limitations do you see with the multi-cycledesign?. Limited concurrency- Some hardware resources are idle during differentphases of instruction processing cycle_"Fetch" logic is idle when an instruction is being"decoded"or"executed"- Most of the datapath is idle when a memory access ishappeningComputerArchitecture
Computer Architecture Can We Do Be*er? • What limitations do you see with the multi-cycle design? • Limited concurrency – Some hardware resources are idle during different phases of instruction processing cycle – “Fetch” logic is idle when an instruction is being “decoded” or “executed” – Most of the datapath is idle when a memory access is happening 2

CanWeUsetheldleHardwareto Improve Concurrency?· Goal: Concurrency →> throughput (more "work"completed in one cycle) Idea: When an instruction is using some resources inits processing phase, process other instructions on idleresources not needed by that instruction- E.g., when an instruction is being decoded, fetch the nextinstruction- E.g., when an instruction is being executed, decode anotherinstruction- E.g., when an instruction is accessing data memory (ld/st)executethenextinstruction- E.g., when an instruction is writing its result into the registerfile, access data memory for the next instructionComputerArchitecture
Computer Architecture Can We Use the Idle Hardware to Improve Concurrency? • Goal: Concurrency à throughput (more “work” completed in one cycle) • Idea: When an instruction is using some resources in its processing phase, process other instructions on idle resources not needed by that instruction – E.g., when an instruction is being decoded, fetch the next instruction – E.g., when an instruction is being executed, decode another instruction – E.g., when an instruction is accessing data memory (ld/st), execute the next instruction – E.g., when an instruction is writing its result into the register file, access data memory for the next instruction 3

Pipelining:Basicldea: More systematically:- Pipeline the execution of multiple instructions-Analogy:"Assembly line processing"of instructions: Idea:- Divide the instruction processing cycle into distinct "stages" ofprocessingEnsurethereareenoughhardware resourcestoprocess oneinstruction in each stage- Process a different instruction in each stageInstructions consecutive inprogramorderareprocessed in consecutivestagesBenefit: Increases instruction processing throughput (1)CPI)Downside: Start thinking about this..ComputerArchitecture
Computer Architecture Pipelining: Basic Idea • More systematically: – Pipeline the execution of multiple instructions – Analogy: “Assembly line processing” of instructions • Idea: – Divide the instruction processing cycle into distinct “stages” of processing – Ensure there are enough hardware resources to process one instruction in each stage – Process a different instruction in each stage • Instructions consecutive in program order are processed in consecutive stages • Benefit: Increases instruction processing throughput (1/ CPI) • Downside: Start thinking about this. 4

Example:ExecutionofFourIndependentADDs: Multi-cycle: 4 cycles per instructionW1Time: Pipelined: 4 cycles per 4 instructions (steady state)DEWDEWDW-TimeComputerArchitecture
Computer Architecture Example: ExecuBon of Four Independent ADDs • Multi-cycle: 4 cycles per instruction • Pipelined: 4 cycles per 4 instructions (steady state) 5 Time F D E W F D E W F D E W F D E W F D E W F D E W F D E W F D E W Time

TheLaundryAnalogy1011126PM18912 AMTimeTaskorder可GB一DC"place one dirty load of clothes in the washer""when the washer is finished, place the wet load in the dryer""when the dryer is finished, take out the dry load and fold'"when folding is finished, ask your roommate (??) to put the clothesaway"-stepstodoaloadaresequentiallydependent-nodependencebetweendifferentloads-differentstepsdonotshareresourcesComputerArchitecture
Computer Architecture The Laundry Analogy • “place one dirty load of clothes in the washer” • “when the washer is finished, place the wet load in the dryer” • “when the dryer is finished, take out the dry load and fold” • “when folding is finished, ask your roommate (??) to put the clothes away” 6 - steps to do a load are sequenBally dependent - no dependence between different loads - different steps do not share resources Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order

Pipelining Multiple Loads of Laundry1891011126 PM12 AMTimeTaskorderC同兰丽ro1011126 PM7812 AMTimeTaskorder-4loadsof laundryinparallel同丽B-noadditionalresourcesC-throughputincreasedby4D-latency per loadis the sameComputerArchitecture
Computer Architecture Pipelining MulBple Loads of Laundry 7 Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order - latency per load is the same - throughput increased by 4 - 4 loads of laundry in parallel - no addiBonal resources

Pipelining MultipleLoads of Laundry:In Practice1011126 PM7892 AM1TimeTaskorderBC6 PM7812910112 AMTimeTaskorder兰丽A口奇丽BC向m国同田thesloweststep decidesthroughputComputerArchitecture
Computer Architecture Pipelining MulBple Loads of Laundry: In PracBce 8 Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order the slowest step decides throughput

Pipelining Multiple Loads ofLaundry:In Practice111289106 PM72 AMTimeTaskorderB6 PM891011122 AM1Time"TaskorderBCLThroughputrestored (2 loads perhour)using2dryersComputerArchitecture
Computer Architecture Pipelining MulBple Loads of Laundry: In PracBce 9 Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order A B A B Throughput restored (2 loads per hour) using 2 dryers Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Time 6 PM 7 8 9 10 11 12 1 2 AM A B C D Task order Task order

AnldealPipeline Goal: Increase throughput with little increase in cost(hardware cost, in case of instruction processing):Repetition of identical operations. The same operation is repeated on a large number ofdifferent inputsRepetition of independentoperations-Nodependenciesbetweenrepeatedoperations.Uniformly partitionable suboperations- Processing can be evenly divided into uniform-latencysuboperations (that do not share resources) Fitting examples: automobile assembly line, doinglaundry- What about the instruction processing "cycle"?ComputerArchitecture10
Computer Architecture An Ideal Pipeline • Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing) • Repetition of identical operations – The same operation is repeated on a large number of different inputs • Repetition of independent operations – No dependencies between repeated operations • Uniformly partitionable suboperations – Processing can be evenly divided into uniform-latency suboperations (that do not share resources) • Fitting examples: automobile assembly line, doing laundry – What about the instruction processing “cycle”? 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 2 Instruction Set Architecture(Microarchitecture Implementation).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 7 Multiprocessors.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 6 Memory Hierarchy and Cache.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 5 Out of Order Execution.pdf
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第4章 基于统计决策的概率分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第1章 绪论、第2章 聚类分析.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第3章 判别函数及几何分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第7章 模糊模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第6章 句法模式识别.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第5章 特征选择与特征提取.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第8章 神经网络模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(实验指导,共五个实验).pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第8章 神经网络在模式识别中的应用.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第7章 模糊模式识别.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第6章 特征提取与选择.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第5章 聚类分析.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第4章 非参数判别分类方法.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第3章 概率密度函数的参数估计.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第2章 贝叶斯决策理论.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 0 Introduction and Performance Evaluation.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 1 Instruction Set Architecture(Introduction).pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 15 GPGPU Architecture and Programming Paradigm.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 11 Multi-core and Multi-threading.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 10 Out of Order and Speculative Execution.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 13 An Introduction to Cloud Data Centers.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 09 Case Study- Jave Branch Prediction Optimization.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 07 Instruction Decode.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 08 Instruction Fetch and Branch Predictioin.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 06 Scoreboarding and Tomasulo.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 05 Core Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 03 Main Memory and DRAM.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 01 Introduction and Performance Evaluation-new.pdf
- 东北大学:某学院计算机科学与技术专业《智能信息系统开发》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《软件工程综合实践》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《创新创业设计基础》课程教学大纲.pdf
