《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 4Memory Data Prefetching
高级计算机体系结构设计及其在数据中心和云计算的应 用 Lecture 4 Memory Data Prefetching

高级计算机体系结构设计及其在数据中心和云计算的应用Prefetching (1/3)Fetch block ahead of demandTarget compulsory, capacity, (& coherence) misses- Why not conflict?· Big challenges:- Knowing"what"to fetch·Fetchinguselessblockswastesresources-Knowing"when"to fetch. Too early → clutters storage (or gets thrown out before use).Fetching too late →>defeats purpose of"pre"-fetching
高级计算机体系结构设计及其在数据中心和云计算的应 用 Prefetching (1/3) • Fetch block ahead of demand • Target compulsory, capacity, (& coherence) misses – Why not conflict? • Big challenges: – Knowing “what” to fetch • Fetching useless blocks wastes resources – Knowing “when” to fetch • Too early clutters storage (or gets thrown out before use) • Fetching too late defeats purpose of “pre”-fetching

高级计算机体系结构设计及其在数据中心和云计算的应用Prefetching (2/3)Withoutprefetching:LL2DRAMLoadDataTotal Load-to-Use LatencytimeWith prefetching:PrefetchDataLoadMuchimprovedLoad-to-UseLatencyOr:DataPrefetchLoadSomewhat improved LatencyPrefetchingmust be accurate and timely
高级计算机体系结构设计及其在数据中心和云计算的应 用 • Without prefetching: • With prefetching: • Or: Prefetch Prefetch Prefetching (2/3) Load L1 L2 Data DRAM Total Load-to-Use Latency Load Data Much improved Load-to-Use Latency Somewhat improved Latency Load Data Prefetching must be accurate and timely time

高级计算机体系结构设计及其在数据中心和云计算的应用Prefetching (3/3)Without prefetchingAARunWithprefetching:LoadtimePrefetchingremovesloadsfromcritical path
高级计算机体系结构设计及其在数据中心和云计算的应 用 Prefetching (3/3) • Without prefetching: • With prefetching: Run Load time Prefetching removes loads from critical path

高级计算机体系结构设计及其在数据中心和云计算的应用Common"Types" of PrefetchingSoftwareNext-Line, Adjacent-LineNext-N-LineStreamBuffers Stride"Localized" (e.g., PC-based)Pointer Correlation
高级计算机体系结构设计及其在数据中心和云计算的应 用 Common “Types” of Prefetching • Software • Next-Line, Adjacent-Line • Next-N-Line • Stream Buffers • Stride • “Localized” (e.g., PC-based) • Pointer • Correlation

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (1/4):Compiler/programmer places prefetch instructionsPut prefetched value into...- Register (binding, also called "hoistinq"):May prevent instructions from committing- Cache (non-binding)·RequiresISA support.Mayget evicted from cache beforedemand
高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (1/4) • Compiler/programmer places prefetch instructions • Put prefetched value into. – Register (binding, also called “hoisting”) • May prevent instructions from committing – Cache (non-binding) • Requires ISA support • May get evicted from cache before demand

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (2/4)Hoisting mustbe aware ofdependenciesRI=[R2]PREFETCH[R2]AARI=RI-IRI=RI-1BCCCBBRI= [R2]RI=[R2]R3=RI+4R3=RI+4R3=RI+4Using a prefetch instructionHopefullytheload miss(Cachemisses in red)canavoidproblemswithisservicedbythetimedata dependencieswe get to the consumer
高级计算机体系结构设计及其在数据中心和云计算的应 用 A B C R3 = R1+4 R1 = [R2] Software Prefetching (2/4) A B C R1 = [R2] R3 = R1+4 (Cache misses in red) Hopefully the load miss is serviced by the time we get to the consumer R1 = R1- 1 R1 = R1- 1 Hoisting must be aware of dependencies A B C R1 = [R2] R3 = R1+4 PREFETCH[R2] Using a prefetch instruction can avoid problems with data dependencies

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (3/4)for(I = 1; I < rows I++)((J = l; J < columns; J++)for(prefetch(&x[I+l,J]);sum + x[I,J] isum=11
高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (3/4) for (I = 1; I < rows; I++) { for (J = 1; J < columns; J++) { prefetch(&x[I+1,J]); sum = sum + x[I,J]; } }

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (4/4).Pros:- Gives programmer control and flexibility-Allowstimeforcomplex(compiler)analysis- No (major) hardware modifications neededCons:-Hard to perform timely prefetches: At IPC=2 and 100-cycle memory > move load 200 inst.earlier.Might notevenhave2oo inst.incurrentfunction- Prefetching earlier and more often leads to low accuracy. Program may go down a different path- Prefetch instructionsincrease codefootprint.May cause more Is misses, code alignment issues
高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (4/4) • Pros: – Gives programmer control and flexibility – Allows time for complex (compiler) analysis – No (major) hardware modifications needed • Cons: – Hard to perform timely prefetches • At IPC=2 and 100-cycle memory move load 200 inst. earlier • Might not even have 200 inst. in current function – Prefetching earlier and more often leads to low accuracy • Program may go down a different path – Prefetch instructions increase code footprint • May cause more I$ misses, code alignment issues

高级计算机体系结构设计及其在数据中心和云计算的应用Hardware Prefetching (1/3). Hardware monitors memory accesses- Looks forcommon patterns. Guessed addresses are placed into prefetch queue- Queue is checked when no demand accesses waitingPrefetcherslooklikeREADregueststothehierarchy-Although may get special"prefetched"flag in the state bits.Prefetchers trade bandwidth for latency-ExtrabandwidthusedonlywhenguessingincorrectlyLatency reduced onlywhenguessing correctlyNo needto change software
高级计算机体系结构设计及其在数据中心和云计算的应 用 Hardware Prefetching (1/3) • Hardware monitors memory accesses – Looks for common patterns • Guessed addresses are placed into prefetch queue – Queue is checked when no demand accesses waiting • Prefetchers look like READ requests to the hierarchy – Although may get special “prefetched” flag in the state bits • Prefetchers trade bandwidth for latency – Extra bandwidth used only when guessing incorrectly – Latency reduced only when guessing correctly No need to change software
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 06 Scoreboarding and Tomasulo.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 08 Instruction Fetch and Branch Predictioin.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 07 Instruction Decode.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 09 Case Study- Jave Branch Prediction Optimization.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 13 An Introduction to Cloud Data Centers.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 10 Out of Order and Speculative Execution.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 11 Multi-core and Multi-threading.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 15 GPGPU Architecture and Programming Paradigm.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 1 Instruction Set Architecture(Introduction).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 0 Introduction and Performance Evaluation.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 3 Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 2 Instruction Set Architecture(Microarchitecture Implementation).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 7 Multiprocessors.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 6 Memory Hierarchy and Cache.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 5 Out of Order Execution.pdf
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第4章 基于统计决策的概率分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第1章 绪论、第2章 聚类分析.ppt
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 05 Core Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 03 Main Memory and DRAM.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 01 Introduction and Performance Evaluation-new.pdf
- 东北大学:某学院计算机科学与技术专业《智能信息系统开发》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《软件工程综合实践》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《创新创业设计基础》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《工程领导力》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《高等数学建模》课程教学大纲(二).pdf
- 东北大学:某学院计算机科学与技术专业《数据库原理》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《物理建模》课程教学大纲 A.pdf
- 东北大学:某学院计算机科学与技术专业《Java程序设计基础》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《算法设计与分析》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《Linux操作系统与内核分析》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《计算机体系结构》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《计算机网络组网技术》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《离散数学》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《Web开发与应用》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《信息安全基础》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《工程经济学》课程教学大纲.pdf