《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 10 Out of Order and Speculative Execution

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 10Speculation and Traps in Out-of-Order Cores
高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 10 Speculation and Traps in Out-of-Order Cores

高级计算机体系结构设计及其在数据中心和云计算的应用What is wrong with Tomasulo's?Branchinstructions Need branch prediction to guess what to fetch next- Need speculative execution to“clean up"wrong guessesExceptionsandTraps("software"interrupts)-Needtohandleuncommon execution cases.Jumptoasoftwarehandler- Should follow the insn.on which they were triggered-Often referred to as precise interruptsDon'tknowrelativeorderof instructionsin Rs
高级计算机体系结构设计及其在数据中心和云计算的应用 What is wrong with Tomasulo’s? • Branch instructions – Need branch prediction to guess what to fetch next – Need speculative execution to “clean up” wrong guesses • Exceptions and Traps (“software” interrupts) – Need to handle uncommon execution cases • Jump to a software handler – Should follow the insn. on which they were triggered – Often referred to as precise interrupts Don’t know relative order of instructions in RS

高级计算机体系结构设计及其在数据中心和云计算的应用Speculation and Precise InterruptsWhenbranchismis-speculatedbypredictor- Must reset state (e.g,. regs) to time of branchSequential semantics for interrupts-Allinsns.beforeinterruptshouldbecomplete- All insns. after interrupt should look as if never started (abort)Whatmakesthisdifficult?-Youngerinsns.finishbeforebranch→>mustundo writebacks Older insns. not done when young branch resolves → must wait.Olderinsn.takespagefault ordividebyzero→forget thebranchSameproblem→Samesolution
高级计算机体系结构设计及其在数据中心和云计算的应用 Speculation and Precise Interrupts • When branch is mis-speculated by predictor – Must reset state (e.g,. regs) to time of branch • Sequential semantics for interrupts – All insns. before interrupt should be complete – All insns. after interrupt should look as if never started (abort) • What makes this difficult? – Younger insns. finish before branch must undo writebacks – Older insns. not done when young branch resolves must wait • Older insn. takes page fault or divide by zero forget the branch Same problem Same solution

高级计算机体系结构设计及其在数据中心和云计算的应用Precise State·Speculative execution requires- (Ability to) abort & restart at every branch- Abort & restart at every load (covered in later lecture): Synchronous (exception and trap) events require- Abort & restart at every load, store, divide, ...Asynchronous(hardware)interruptsrequire- Abort & restart at every ??Real world: bite the bullet-Implementabort&restartateveryinsn-Calledprecisestate
高级计算机体系结构设计及其在数据中心和云计算的应用 Precise State • Speculative execution requires – (Ability to) abort & restart at every branch – Abort & restart at every load (covered in later lecture) • Synchronous (exception and trap) events require – Abort & restart at every load, store, divide, . • Asynchronous (hardware) interrupts require – Abort & restart at every ?? • Real world: bite the bullet – Implement abort & restart at every insn. – Called precise state

高级计算机体系结构设计及其在数据中心和云计算的应用Precise State Implementation Options: Imprecise state: ignore the problem!- Makes page faults (any restartable exceptions) difficult- Makes speculative execution practically impossibleForce in-order completion (W): stall pipe if necessary- Slow (takes away benefit of Out-of-Order)Keeptrackofprecisestateinhardware- Reset current state from precise state when neededEverythingisbetterinhardware
高级计算机体系结构设计及其在数据中心和云计算的应用 Precise State Implementation Options • Imprecise state: ignore the problem! – Makes page faults (any restartable exceptions) difficult – Makes speculative execution practically impossible • Force in-order completion (W): stall pipe if necessary – Slow (takes away benefit of Out Slow (takes away benefit of Out-of-Order) • Keep track of precise state in hardware – Reset current state from precise state when needed Everything is better in hardware

高级计算机体系结构设计及其在数据中心和云计算的应用Our-of-Order Topics"Scoreboardinq-FirstOoO,noregisterrenaming"Tomasulo'salgorithm"-OoOwithregisterrenamingHandlingprecisestateand speculation-P6-styleexecution(lntelPentiumPro)-R10k-styleexecution(MIPSR10k)Handling memory dependencies
高级计算机体系结构设计及其在数据中心和云计算的应用 Our-of-Order Topics • “Scoreboarding” – First OoO, no register renaming • “Tomasulo’s algorithm” – OoO with register renaming • Handling precise state and speculation – P6-style execution (Intel Pentium Pro) – R10k-style execution (MIPS R10k) • Handling memory dependencies

高级计算机体系结构设计及其在数据中心和云计算的应用The Problem with Precise StateinsnbufferregfileIsBPProblem:writebackcombinestwofunctions-Forwardvaluestoyoungerinsns.:out-of-orderisOK- Write values to registers:needs to be in orderSimilar solution as for OoO decode-Splitwritebackintotwostages
高级计算机体系结构设计及其在数据中心和云计算的应用 The Problem with Precise State regfile L1-D I$ B P insn buffer • Problem: writeback combines two functions – Forward values to younger insns.: out-of-order is OK – Write values to registers: needs to be in order • Similar solution as for OoO decode – Split writeback into two stages

高级计算机体系结构设计及其在数据中心和云计算的应用Re-OrderBuffer(ROB)Re-orderBuffer(ROB)reqfileIsBPInsn.buffer→Re-OrderBuffer(ROB)-Buffercompleted resultsenroutetoregisterfile Can be merged with RS (RUU) or separate (common today) Split writeback (W) into two stages-WhyistherenolatchbetweenW1andW2?
高级计算机体系结构设计及其在数据中心和云计算的应用 Re-Order Buffer (ROB) regfile L1-D I$ B P Re-Order Buffer (ROB) • Insn. buffer Re-Order Buffer (ROB) – Buffer completed results en route to register file – Can be merged with RS (RUU) or separate (common today) • Split writeback (W) into two stages – Why is there no latch between W1 and W2?

高级计算机体系结构设计及其在数据中心和云计算的应用Complete and RetireRe-orderBuffer(ROB)eqfileIs上1BP. Complete (C): insns. write results into ROB- Out-of-order:don'tblockyoungerinsnsRetire (R): a.k.a. commit, graduate-ROB writes resultsto registerfileIn-order:stall back-propagatestoyoungerinsns
高级计算机体系结构设计及其在数据中心和云计算的应用 Complete and Retire regfile L1-D I$ B P Re-Order Buffer (ROB) C R • Complete (C): insns. write results into ROB – Out-of-order: don’t block younger insns. • Retire (R): a.k.a. commit, graduate – ROB writes results to register file – In-order: stall back-propagates to younger insns

高级计算机体系结构设计及其在数据中心和云计算的应用P6 Data StructuresP6: Start with Tomasulo's algorithm... add ROBROB (separatefrom RS)- head, tail: pointers maintain sequential order- R: insn. output register, V: insn. output valueTags are differentTomasulo:RS#→>P6:ROB#·Map Table is different- T+: tag +"ready-in-ROB" bit-T==0→Value is readyin registerfile-T!=O→Valueisnotready- T!=O+→ Value is ready in the ROB
高级计算机体系结构设计及其在数据中心和云计算的应用 P6 Data Structures • P6: Start with Tomasulo’s algorithm. add ROB • ROB (separate from RS) – head, tail: pointers maintain sequential order – R: insn. output register, V: insn. output value • Tags are different – Tomasulo: RS# P6: ROB# • Map Table is different – T+: tag + “ready-in-ROB” bit – T==0 Value is ready in register file – T!=0 Value is not ready – T!=0+ Value is ready in the ROB
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 11 Multi-core and Multi-threading.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 15 GPGPU Architecture and Programming Paradigm.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 1 Instruction Set Architecture(Introduction).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 0 Introduction and Performance Evaluation.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 3 Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 2 Instruction Set Architecture(Microarchitecture Implementation).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 7 Multiprocessors.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 6 Memory Hierarchy and Cache.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 5 Out of Order Execution.pdf
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第4章 基于统计决策的概率分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第1章 绪论、第2章 聚类分析.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第3章 判别函数及几何分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第7章 模糊模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第6章 句法模式识别.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第5章 特征选择与特征提取.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第8章 神经网络模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(实验指导,共五个实验).pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 13 An Introduction to Cloud Data Centers.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 09 Case Study- Jave Branch Prediction Optimization.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 07 Instruction Decode.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 08 Instruction Fetch and Branch Predictioin.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 06 Scoreboarding and Tomasulo.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 05 Core Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 03 Main Memory and DRAM.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 01 Introduction and Performance Evaluation-new.pdf
- 东北大学:某学院计算机科学与技术专业《智能信息系统开发》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《软件工程综合实践》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《创新创业设计基础》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《工程领导力》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《高等数学建模》课程教学大纲(二).pdf
- 东北大学:某学院计算机科学与技术专业《数据库原理》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《物理建模》课程教学大纲 A.pdf
- 东北大学:某学院计算机科学与技术专业《Java程序设计基础》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《算法设计与分析》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《Linux操作系统与内核分析》课程教学大纲.pdf