中国高校课件下载中心 》 教学资源 》 大学文库

《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution

文档信息
资源类别:文库
文档格式:PDF
文档页数:120
文件大小:4.07MB
团购合买:点击进入团购
内容简介
《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution
刷新页面文档预览

Computer ArchitectureSpeculative ExecutionComputerArchitecture

Computer Architecture Computer Architecture Specula1ve Execu1on

Review: Branch TypesNumber ofTypeDirectionatWhen is nextfetchtimepossible nextfetch addressfetch addresses?resolved?2ConditionalUnknownExecution(registerdependent)1Always takenDecode (PC +Unconditionaloffset)1CallDecode (PC +Always takenoffset)ManyReturnAlways takenExecution(registerdependent)IndirectManyAlways takenExecution(registerdependent)Different branch types can be handled differentlyComputerArchitecture2

Computer Architecture Review: Branch Types Type Direction at fetch time Number of possible next fetch addresses? When is next fetch address resolved? Conditional Unknown 2 Execution (register dependent) Unconditional Always taken 1 Decode (PC + offset) Call Always taken 1 Decode (PC + offset) Return Always taken Many Execution (register dependent) Indirect Always taken Many Execution (register dependent) 2 Different branch types can be handled differently

Review: How to Handle Control DependencesCritical to keep the pipeline full with correct sequence ofdynamic instructions.Potential solutionsiftheinstructionisacontrol-flowinstruction:Stall the pipeline until we know the next fetch addressGuess the next fetch address (branch prediction)Employ delayed branching (branch delay slot)Do something else (fine-grained multithreading)Eliminate control-flow instructions (predicated execution)Fetchfrom bothpossiblepaths (if youknowtheaddressesofbothpossiblepaths)(multipathexecution)ComputerArchitecture

Computer Architecture Review: How to Handle Control Dependences • Critical to keep the pipeline full with correct sequence of dynamic instructions. • Potential solutions if the instruction is a control-flow instruction: • Stall the pipeline until we know the next fetch address • Guess the next fetch address (branch prediction) • Employ delayed branching (branch delay slot) • Do something else (fine-grained multithreading) • Eliminate control-flow instructions (predicated execution) • Fetch from both possible paths (if you know the addresses of both possible paths) (multipath execution) 3

How to Handle ControlDependencesCritical to keep the pipeline full with correct sequence ofdynamic instructions.Potential solutions if the instruction isa control-flowinstruction:Stall the pipeline until we know the next fetch addressGuess the next fetch address (branch prediction)Employdelayedbranching(branchdelayslot)Do something else (fine-grained multithreading)Eliminate control-flow instructions (predicated execution)Fetch from both possible paths (if you know the addressesof both possible paths) (multipath execution)ComputerArchitecture

Computer Architecture How to Handle Control Dependences • Critical to keep the pipeline full with correct sequence of dynamic instructions. • Potential solutions if the instruction is a control-flow instruction: • Stall the pipeline until we know the next fetch address • Guess the next fetch address (branch prediction) • Employ delayed branching (branch delay slot) • Do something else (fine-grained multithreading) • Eliminate control-flow instructions (predicated execution) • Fetch from both possible paths (if you know the addresses of both possible paths) (multipath execution) 4

Review: Branch Prediction: Idea: Predict the next fetch address (to be used in thenext cycle) Requires three things to be predicted at fetch stage:Whetherthefetched instruction is abranch(Conditional) branch directionBranch target address (if taken)Observation: Target address remains the same for aconditional direct branch across dynamic instances Idea: Store the target address from previous instance andaccess it with the Pc- Called Branch Target Buffer (BTB) or Branch Target AddresSCacheComputerArchitecture

Computer Architecture Review: Branch Prediction • Idea: Predict the next fetch address (to be used in the next cycle) • Requires three things to be predicted at fetch stage: – Whether the fetched instruction is a branch – (Conditional) branch direction – Branch target address (if taken) • Observation: Target address remains the same for a conditional direct branch across dynamic instances – Idea: Store the target address from previous instance and access it with the PC – Called Branch Target Buffer (BTB) or Branch Target Address Cache 5

Review:FetchStagewithBTBDirectionpredictor (2-bitcounters)taken?PC+inst sizeNextFetchAddressProgramhit?Counter/Addressofthecurrent instructiontarget addressCacheofTargetAddresses(BTB:BranchTargetBuffer)A/ways-takenCPl=[1+(0.20*0.3)*2]=1.12(70% of branches taken)ComputerArchitecture

Computer Architecture target address Review: Fetch Stage with BTB Direction predictor (2-bit counters) Cache of Target Addresses (BTB: Branch Target Buffer) Program Counter PC + inst size taken? Next Fetch Address hit? Address of the current instruction Always-taken CPI = [ 1 + (0.20*0.3) * 2 ] = 1.12 (70% of branches taken) 6

Simple Branch Direction Prediction Schemes: Compile time (static)- Always not taken-Always taken- BTFN (Backward taken, forward not taken)- Profile based (likely direction): Run time (dynamic) Last time prediction (single-bit)ComputerArchitecture

Computer Architecture Simple Branch Direction Prediction Schemes • Compile time (static) – Always not taken – Always taken – BTFN (Backward taken, forward not taken) – Profile based (likely direction) • Run time (dynamic) – Last time prediction (single-bit) 7

More Sophisticated Direction Prediction: Compile time (static)- Always not taken Always taken- BTFN (Backward taken, forward not taken)- Profile based (likely direction)- Program analysis based (likely direction): Run time (dynamic)- Last time prediction (single-bit)-Two-bit counterbased predictionTwo-level prediction (global vs. local)- HybridComputerArchitecture

Computer Architecture More Sophisticated Direction Prediction • Compile time (static) – Always not taken – Always taken – BTFN (Backward taken, forward not taken) – Profile based (likely direction) – Program analysis based (likely direction) • Run time (dynamic) – Last time prediction (single-bit) – Two-bit counter based prediction – Two-level prediction (global vs. local) – Hybrid 8

Static Branch Prediction (T). Always not-taken- Simple to implement: no need for BTB, no direction prediction- Low accuracy: ~30-40%- Compiler can layout code such that the likely path is the “not-taken" path.Always taken-Nodirectionprediction- Better accuracy: ~60-70%:Backwardbranches(i.e.loopbranches)areusuallytaker? Backward branch: target address lower than branch PCBackward taken, forward not taken (BTFN) Predict backward (loop) branches as taken, others not-takenComputerArchitecturet

Computer Architecture Static Branch Prediction (I) • Always not-taken – Simple to implement: no need for BTB, no direction prediction – Low accuracy: ~30-40% – Compiler can layout code such that the likely path is the “not￾taken” path • Always taken – No direction prediction – Better accuracy: ~60-70% • Backward branches (i.e. loop branches) are usually taken • Backward branch: target address lower than branch PC • Backward taken, forward not taken (BTFN) – Predict backward (loop) branches as taken, others not-taken 9

Static Branch Prediction (II.Profile-based- Idea: Compiler determines likely direction for each branchusing profile run. Encodes that direction as a hint bit in thebranch instruction format.+ Per branch prediction (more accurate than schemes inprevious slide) → accurate if profile is representative!-- Reguires hint bits in the branch instruction format-- Accuracy depends on dynamic branch behavior:TTTTTTTTTTNNNNNNNNNN > 50% accuraCyTNTNTNTNTNTNTNTNTNTN →> 50% accuracy-- Accuracy depends on the representativeness of profileinput setComputerArchitecture10

Computer Architecture Static Branch Prediction (II) • Profile-based – Idea: Compiler determines likely direction for each branch using profile run. Encodes that direction as a hint bit in the branch instruction format. + Per branch prediction (more accurate than schemes in previous slide) à accurate if profile is representative! - Requires hint bits in the branch instruction format - Accuracy depends on dynamic branch behavior: TTTTTTTTTTNNNNNNNNNN à 50% accuracy TNTNTNTNTNTNTNTNTNTN à 50% accuracy - Accuracy depends on the representativeness of profile input set 10

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档