中国高校课件下载中心 》 教学资源 》 大学文库

《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching

文档信息
资源类别:文库
文档格式:PDF
文档页数:32
文件大小:1.72MB
团购合买:点击进入团购
内容简介
《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching
刷新页面文档预览

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 4Memory Data Prefetching

高级计算机体系结构设计及其在数据中心和云计算的应 用 Lecture 4 Memory Data Prefetching

高级计算机体系结构设计及其在数据中心和云计算的应用Prefetching (1/3)Fetch block ahead of demandTarget compulsory, capacity, (& coherence) misses- Why not conflict?· Big challenges:- Knowing"what"to fetch·Fetchinguselessblockswastesresources-Knowing"when"to fetch. Too early → clutters storage (or gets thrown out before use).Fetching too late →>defeats purpose of"pre"-fetching

高级计算机体系结构设计及其在数据中心和云计算的应 用 Prefetching (1/3) • Fetch block ahead of demand • Target compulsory, capacity, (& coherence) misses – Why not conflict? • Big challenges: – Knowing “what” to fetch • Fetching useless blocks wastes resources – Knowing “when” to fetch • Too early  clutters storage (or gets thrown out before use) • Fetching too late  defeats purpose of “pre”-fetching

高级计算机体系结构设计及其在数据中心和云计算的应用Prefetching (2/3)Withoutprefetching:LL2DRAMLoadDataTotal Load-to-Use LatencytimeWith prefetching:PrefetchDataLoadMuchimprovedLoad-to-UseLatencyOr:DataPrefetchLoadSomewhat improved LatencyPrefetchingmust be accurate and timely

高级计算机体系结构设计及其在数据中心和云计算的应 用 • Without prefetching: • With prefetching: • Or: Prefetch Prefetch Prefetching (2/3) Load L1 L2 Data DRAM Total Load-to-Use Latency Load Data Much improved Load-to-Use Latency Somewhat improved Latency Load Data Prefetching must be accurate and timely time

高级计算机体系结构设计及其在数据中心和云计算的应用Prefetching (3/3)Without prefetchingAARunWithprefetching:LoadtimePrefetchingremovesloadsfromcritical path

高级计算机体系结构设计及其在数据中心和云计算的应 用 Prefetching (3/3) • Without prefetching: • With prefetching: Run Load time Prefetching removes loads from critical path

高级计算机体系结构设计及其在数据中心和云计算的应用Common"Types" of PrefetchingSoftwareNext-Line, Adjacent-LineNext-N-LineStreamBuffers Stride"Localized" (e.g., PC-based)Pointer Correlation

高级计算机体系结构设计及其在数据中心和云计算的应 用 Common “Types” of Prefetching • Software • Next-Line, Adjacent-Line • Next-N-Line • Stream Buffers • Stride • “Localized” (e.g., PC-based) • Pointer • Correlation

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (1/4):Compiler/programmer places prefetch instructionsPut prefetched value into...- Register (binding, also called "hoistinq"):May prevent instructions from committing- Cache (non-binding)·RequiresISA support.Mayget evicted from cache beforedemand

高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (1/4) • Compiler/programmer places prefetch instructions • Put prefetched value into. – Register (binding, also called “hoisting”) • May prevent instructions from committing – Cache (non-binding) • Requires ISA support • May get evicted from cache before demand

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (2/4)Hoisting mustbe aware ofdependenciesRI=[R2]PREFETCH[R2]AARI=RI-IRI=RI-1BCCCBBRI= [R2]RI=[R2]R3=RI+4R3=RI+4R3=RI+4Using a prefetch instructionHopefullytheload miss(Cachemisses in red)canavoidproblemswithisservicedbythetimedata dependencieswe get to the consumer

高级计算机体系结构设计及其在数据中心和云计算的应 用 A B C R3 = R1+4 R1 = [R2] Software Prefetching (2/4) A B C R1 = [R2] R3 = R1+4 (Cache misses in red) Hopefully the load miss is serviced by the time we get to the consumer R1 = R1- 1 R1 = R1- 1 Hoisting must be aware of dependencies A B C R1 = [R2] R3 = R1+4 PREFETCH[R2] Using a prefetch instruction can avoid problems with data dependencies

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (3/4)for(I = 1; I < rows I++)((J = l; J < columns; J++)for(prefetch(&x[I+l,J]);sum + x[I,J] isum=11

高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (3/4) for (I = 1; I < rows; I++) { for (J = 1; J < columns; J++) { prefetch(&x[I+1,J]); sum = sum + x[I,J]; } }

高级计算机体系结构设计及其在数据中心和云计算的应用Software Prefetching (4/4).Pros:- Gives programmer control and flexibility-Allowstimeforcomplex(compiler)analysis- No (major) hardware modifications neededCons:-Hard to perform timely prefetches: At IPC=2 and 100-cycle memory > move load 200 inst.earlier.Might notevenhave2oo inst.incurrentfunction- Prefetching earlier and more often leads to low accuracy. Program may go down a different path- Prefetch instructionsincrease codefootprint.May cause more Is misses, code alignment issues

高级计算机体系结构设计及其在数据中心和云计算的应 用 Software Prefetching (4/4) • Pros: – Gives programmer control and flexibility – Allows time for complex (compiler) analysis – No (major) hardware modifications needed • Cons: – Hard to perform timely prefetches • At IPC=2 and 100-cycle memory  move load 200 inst. earlier • Might not even have 200 inst. in current function – Prefetching earlier and more often leads to low accuracy • Program may go down a different path – Prefetch instructions increase code footprint • May cause more I$ misses, code alignment issues

高级计算机体系结构设计及其在数据中心和云计算的应用Hardware Prefetching (1/3). Hardware monitors memory accesses- Looks forcommon patterns. Guessed addresses are placed into prefetch queue- Queue is checked when no demand accesses waitingPrefetcherslooklikeREADregueststothehierarchy-Although may get special"prefetched"flag in the state bits.Prefetchers trade bandwidth for latency-ExtrabandwidthusedonlywhenguessingincorrectlyLatency reduced onlywhenguessing correctlyNo needto change software

高级计算机体系结构设计及其在数据中心和云计算的应 用 Hardware Prefetching (1/3) • Hardware monitors memory accesses – Looks for common patterns • Guessed addresses are placed into prefetch queue – Queue is checked when no demand accesses waiting • Prefetchers look like READ requests to the hierarchy – Although may get special “prefetched” flag in the state bits • Prefetchers trade bandwidth for latency – Extra bandwidth used only when guessing incorrectly – Latency reduced only when guessing correctly No need to change software

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档