中国高校课件下载中心 》 教学资源 》 大学文库

《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches

文档信息
资源类别:文库
文档格式:PDF
文档页数:44
文件大小:3.27MB
团购合买:点击进入团购
内容简介
《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches
刷新页面文档预览

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 2MemoryHierarchy&Caches

高级计算机体系结构设计及其在数据中心和云计算的应 用 Lecture 2 Memory Hierarchy & Caches

高级计算机体系结构设计及其在数据中心和云计算的应用This LectureMemory HierarchyCaches/SRAMCacheOrganization andOptimizations

高级计算机体系结构设计及其在数据中心和云计算的应 用 This Lecture • Memory Hierarchy • Caches / SRAM • Cache Organization and Optimizations

高级计算机体系结构设计及其在数据中心和云计算的应用Motivation100001000Processor10010Memory1198519901995200020052010Want memory to appear:AsfastasCPU As large as required by all of the running applications

高级计算机体系结构设计及其在数据中心和云计算的应 用 1 10 100 1000 10000 1985 1990 1995 2000 2005 2010 Performance Motivation • Want memory to appear: – As fast as CPU – As large as required by all of the running applications Processor Memory

高级计算机体系结构设计及其在数据中心和云计算的应用Storage HierarchyMake common casefast:-Common:temporal&spatial locality-Fast:smallermoreexpensivememoryRegistersControlledBiggerTransfersMoreBandwidthby HardwareFasterLargerCaches(SRAM)Controlledby SoftwareCheaper.(OS)Memory(DRAM)[SSD?(Flash)]Disk(Magnetic Media)WhatisS(tatic)RAMvsD(dynamic)RAM?

高级计算机体系结构设计及其在数据中心和云计算的应 用 Storage Hierarchy • Make common case fast: – Common: temporal & spatial locality – Fast: smaller more expensive memory Controlled by Hardware Controlled by Software (OS) Bigger Transfers Larger Cheaper More Bandwidth Faster Registers Caches (SRAM) Memory (DRAM) [SSD? (Flash)] Disk (Magnetic Media) What is S(tatic)RAM vs D(dynamic)RAM?

高级计算机体系结构设计及其在数据中心和云计算的应用CachesAn automatically managed hierarchyCoreBreak memory into blocks (several bytes)andtransferdatato/fromcacheinblocks$spatial localityMemoryKeep recentlyaccessed blocks-temporallocality

高级计算机体系结构设计及其在数据中心和云计算的应 用 Caches • An automatically managed hierarchy • Break memory into blocks (several bytes) and transfer data to/from cache in blocks – spatial locality • Keep recently accessed blocks – temporal locality Core $ Memory

高级计算机体系结构设计及其在数据中心和云计算的应用Cache Terminologyblock(cacheline):minimumunit thatmaybecachedframe: cache storage location to hold one blockhit: block is found in the cachemiss: block is not found in the cachemiss ratio: fraction of references that misshit time: time to access the cachemiss penalty:time to replace block ona miss

高级计算机体系结构设计及其在数据中心和云计算的应 用 Cache Terminology • block (cache line): minimum unit that may be cached • frame: cache storage location to hold one block • hit: block is found in the cache • miss: block is not found in the cache • miss ratio: fraction of references that miss • hit time: time to access the cache • miss penalty: time to replace block on a miss

高级计算机体系结构设计及其在数据中心和云计算的应用Cache ExampleAddress sequence from core:Core(assume8-bytelines)Miss0x10000Oxlo00o (...data...Hit0x100040x10008(..data...)Miss0x101200x10120(...data...)Miss0x10008Hit0x10124Hit0x10004MemoryFinalmissratiois50%

高级计算机体系结构设计及其在数据中心和云计算的应 用 Miss Cache Example • Address sequence from core: (assume 8-byte lines) Memory 0x10000 (.data.) 0x10120 (.data.) Hit 0x10008 (.data.) Miss Miss Hit Hit Final miss ratio is 50% Core 0x10000 0x10004 0x10120 0x10008 0x10124 0x10004

高级计算机体系结构设计及其在数据中心和云计算的应用AMAT (1/2)Verypowerful tooltoestimateperformanceIf..cache hit is 10 cycles (core to L1 and back)memory access is 100 cycles (core to mem and back)Then...at 50% miss ratio, avg. access: 0.5x10+0.5x100 = 55at 10% miss ratio, avg. access: 0.9x10+0.1x100 = 19at 1% miss ratio, avg. access: 0.99x10+0.01x100 ~ 11

高级计算机体系结构设计及其在数据中心和云计算的应 用 AMAT (1/2) • Very powerful tool to estimate performance • If . cache hit is 10 cycles (core to L1 and back) memory access is 100 cycles (core to mem and back) • Then . at 50% miss ratio, avg. access: 0.5×10+0.5×100 = 55 at 10% miss ratio, avg. access: 0.9×10+0.1×100 = 19 at 1% miss ratio, avg. access: 0.99×10+0.01×100 ≈ 11

高级计算机体系结构设计及其在数据中心和云计算的应用AMAT (2/2). Generalizes nicely to any-depth hierarchyIf...L1 cache hit is 5 cycles (core to L1 and back)L2 cache hit is 20 cycles (core to L2 and back)memory access is 100 cycles (core to mem and back).Then...at 20% miss ratio in L1 and 40% miss ratio in L2 ..avg. access: 0.8x5+0.2×(0.6x20+0.4x100) ~ 14

高级计算机体系结构设计及其在数据中心和云计算的应 用 AMAT (2/2) • Generalizes nicely to any-depth hierarchy • If . L1 cache hit is 5 cycles (core to L1 and back) L2 cache hit is 20 cycles (core to L2 and back) memory access is 100 cycles (core to mem and back) • Then . at 20% miss ratio in L1 and 40% miss ratio in L2 . avg. access: 0.8×5+0.2×(0.6×20+0.4×100) ≈ 14

高级计算机体系结构设计及其在数据中心和云计算的应用Memory Organization (1/3)ProcessorRegistersI-TLBLID-CacheD-TLBLII-CacheL2CacheL3Cache (LLC)MainMemory (DRAM)

高级计算机体系结构设计及其在数据中心和云计算的应 用 Processor Memory Organization (1/3) Registers L1 I-Cache L1 D-Cache L2 Cache I-TLB D-TLB Main Memory (DRAM) L3 Cache (LLC)

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档