《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution

Computer ArchitectureSpeculative ExecutionComputerArchitecture
Computer Architecture Computer Architecture Specula1ve Execu1on

Review: Branch TypesNumber ofTypeDirectionatWhen is nextfetchtimepossible nextfetch addressfetch addresses?resolved?2ConditionalUnknownExecution(registerdependent)1Always takenDecode (PC +Unconditionaloffset)1CallDecode (PC +Always takenoffset)ManyReturnAlways takenExecution(registerdependent)IndirectManyAlways takenExecution(registerdependent)Different branch types can be handled differentlyComputerArchitecture2
Computer Architecture Review: Branch Types Type Direction at fetch time Number of possible next fetch addresses? When is next fetch address resolved? Conditional Unknown 2 Execution (register dependent) Unconditional Always taken 1 Decode (PC + offset) Call Always taken 1 Decode (PC + offset) Return Always taken Many Execution (register dependent) Indirect Always taken Many Execution (register dependent) 2 Different branch types can be handled differently

Review: How to Handle Control DependencesCritical to keep the pipeline full with correct sequence ofdynamic instructions.Potential solutionsiftheinstructionisacontrol-flowinstruction:Stall the pipeline until we know the next fetch addressGuess the next fetch address (branch prediction)Employ delayed branching (branch delay slot)Do something else (fine-grained multithreading)Eliminate control-flow instructions (predicated execution)Fetchfrom bothpossiblepaths (if youknowtheaddressesofbothpossiblepaths)(multipathexecution)ComputerArchitecture
Computer Architecture Review: How to Handle Control Dependences • Critical to keep the pipeline full with correct sequence of dynamic instructions. • Potential solutions if the instruction is a control-flow instruction: • Stall the pipeline until we know the next fetch address • Guess the next fetch address (branch prediction) • Employ delayed branching (branch delay slot) • Do something else (fine-grained multithreading) • Eliminate control-flow instructions (predicated execution) • Fetch from both possible paths (if you know the addresses of both possible paths) (multipath execution) 3

How to Handle ControlDependencesCritical to keep the pipeline full with correct sequence ofdynamic instructions.Potential solutions if the instruction isa control-flowinstruction:Stall the pipeline until we know the next fetch addressGuess the next fetch address (branch prediction)Employdelayedbranching(branchdelayslot)Do something else (fine-grained multithreading)Eliminate control-flow instructions (predicated execution)Fetch from both possible paths (if you know the addressesof both possible paths) (multipath execution)ComputerArchitecture
Computer Architecture How to Handle Control Dependences • Critical to keep the pipeline full with correct sequence of dynamic instructions. • Potential solutions if the instruction is a control-flow instruction: • Stall the pipeline until we know the next fetch address • Guess the next fetch address (branch prediction) • Employ delayed branching (branch delay slot) • Do something else (fine-grained multithreading) • Eliminate control-flow instructions (predicated execution) • Fetch from both possible paths (if you know the addresses of both possible paths) (multipath execution) 4

Review: Branch Prediction: Idea: Predict the next fetch address (to be used in thenext cycle) Requires three things to be predicted at fetch stage:Whetherthefetched instruction is abranch(Conditional) branch directionBranch target address (if taken)Observation: Target address remains the same for aconditional direct branch across dynamic instances Idea: Store the target address from previous instance andaccess it with the Pc- Called Branch Target Buffer (BTB) or Branch Target AddresSCacheComputerArchitecture
Computer Architecture Review: Branch Prediction • Idea: Predict the next fetch address (to be used in the next cycle) • Requires three things to be predicted at fetch stage: – Whether the fetched instruction is a branch – (Conditional) branch direction – Branch target address (if taken) • Observation: Target address remains the same for a conditional direct branch across dynamic instances – Idea: Store the target address from previous instance and access it with the PC – Called Branch Target Buffer (BTB) or Branch Target Address Cache 5

Review:FetchStagewithBTBDirectionpredictor (2-bitcounters)taken?PC+inst sizeNextFetchAddressProgramhit?Counter/Addressofthecurrent instructiontarget addressCacheofTargetAddresses(BTB:BranchTargetBuffer)A/ways-takenCPl=[1+(0.20*0.3)*2]=1.12(70% of branches taken)ComputerArchitecture
Computer Architecture target address Review: Fetch Stage with BTB Direction predictor (2-bit counters) Cache of Target Addresses (BTB: Branch Target Buffer) Program Counter PC + inst size taken? Next Fetch Address hit? Address of the current instruction Always-taken CPI = [ 1 + (0.20*0.3) * 2 ] = 1.12 (70% of branches taken) 6

Simple Branch Direction Prediction Schemes: Compile time (static)- Always not taken-Always taken- BTFN (Backward taken, forward not taken)- Profile based (likely direction): Run time (dynamic) Last time prediction (single-bit)ComputerArchitecture
Computer Architecture Simple Branch Direction Prediction Schemes • Compile time (static) – Always not taken – Always taken – BTFN (Backward taken, forward not taken) – Profile based (likely direction) • Run time (dynamic) – Last time prediction (single-bit) 7

More Sophisticated Direction Prediction: Compile time (static)- Always not taken Always taken- BTFN (Backward taken, forward not taken)- Profile based (likely direction)- Program analysis based (likely direction): Run time (dynamic)- Last time prediction (single-bit)-Two-bit counterbased predictionTwo-level prediction (global vs. local)- HybridComputerArchitecture
Computer Architecture More Sophisticated Direction Prediction • Compile time (static) – Always not taken – Always taken – BTFN (Backward taken, forward not taken) – Profile based (likely direction) – Program analysis based (likely direction) • Run time (dynamic) – Last time prediction (single-bit) – Two-bit counter based prediction – Two-level prediction (global vs. local) – Hybrid 8

Static Branch Prediction (T). Always not-taken- Simple to implement: no need for BTB, no direction prediction- Low accuracy: ~30-40%- Compiler can layout code such that the likely path is the “not-taken" path.Always taken-Nodirectionprediction- Better accuracy: ~60-70%:Backwardbranches(i.e.loopbranches)areusuallytaker? Backward branch: target address lower than branch PCBackward taken, forward not taken (BTFN) Predict backward (loop) branches as taken, others not-takenComputerArchitecturet
Computer Architecture Static Branch Prediction (I) • Always not-taken – Simple to implement: no need for BTB, no direction prediction – Low accuracy: ~30-40% – Compiler can layout code such that the likely path is the “nottaken” path • Always taken – No direction prediction – Better accuracy: ~60-70% • Backward branches (i.e. loop branches) are usually taken • Backward branch: target address lower than branch PC • Backward taken, forward not taken (BTFN) – Predict backward (loop) branches as taken, others not-taken 9

Static Branch Prediction (II.Profile-based- Idea: Compiler determines likely direction for each branchusing profile run. Encodes that direction as a hint bit in thebranch instruction format.+ Per branch prediction (more accurate than schemes inprevious slide) → accurate if profile is representative!-- Reguires hint bits in the branch instruction format-- Accuracy depends on dynamic branch behavior:TTTTTTTTTTNNNNNNNNNN > 50% accuraCyTNTNTNTNTNTNTNTNTNTN →> 50% accuracy-- Accuracy depends on the representativeness of profileinput setComputerArchitecture10
Computer Architecture Static Branch Prediction (II) • Profile-based – Idea: Compiler determines likely direction for each branch using profile run. Encodes that direction as a hint bit in the branch instruction format. + Per branch prediction (more accurate than schemes in previous slide) à accurate if profile is representative! - Requires hint bits in the branch instruction format - Accuracy depends on dynamic branch behavior: TTTTTTTTTTNNNNNNNNNN à 50% accuracy TNTNTNTNTNTNTNTNTNTN à 50% accuracy - Accuracy depends on the representativeness of profile input set 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 6 Memory Hierarchy and Cache.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 5 Out of Order Execution.pdf
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第4章 基于统计决策的概率分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第1章 绪论、第2章 聚类分析.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第3章 判别函数及几何分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第7章 模糊模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第6章 句法模式识别.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第5章 特征选择与特征提取.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第8章 神经网络模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(实验指导,共五个实验).pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第8章 神经网络在模式识别中的应用.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第7章 模糊模式识别.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第6章 特征提取与选择.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第5章 聚类分析.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第4章 非参数判别分类方法.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第3章 概率密度函数的参数估计.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第2章 贝叶斯决策理论.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第1章 绪论.pdf
- 武汉理工大学:《模式识别》课程教学大纲 Pattern Recognition(研究生).pdf
- 《高性能计算机网络》课程教学课件(讲义)第十章 大数据之Web典型应用 第56讲 推荐系统简介.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 7 Multiprocessors.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 2 Instruction Set Architecture(Microarchitecture Implementation).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 3 Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 0 Introduction and Performance Evaluation.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 1 Instruction Set Architecture(Introduction).pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 15 GPGPU Architecture and Programming Paradigm.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 11 Multi-core and Multi-threading.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 10 Out of Order and Speculative Execution.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 13 An Introduction to Cloud Data Centers.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 09 Case Study- Jave Branch Prediction Optimization.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 07 Instruction Decode.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 08 Instruction Fetch and Branch Predictioin.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 06 Scoreboarding and Tomasulo.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 05 Core Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 03 Main Memory and DRAM.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 01 Introduction and Performance Evaluation-new.pdf