《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 02 Memory Hierarchy and Caches

点击下载完整版文档（PDF）

文档信息

资源类别：文库
文档格式：PDF
文档页数：44
文件大小：3.27MB
团购合买：点击进入团购

内容简介

《现代计算机体系结构》课程教学课件（英文讲稿）Lecture 02 Memory Hierarchy and Caches

刷新页面文档预览

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 2MemoryHierarchy&Caches

高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 2 Memory Hierarchy & Caches

高级计算机体系结构设计及其在数据中心和云计算的应用This LectureMemory HierarchyCaches/SRAMCacheOrganization andOptimizations

高级计算机体系结构设计及其在数据中心和云计算的应用 This Lecture • Memory Hierarchy • Caches / SRAM • Cache Organization and Optimizations

高级计算机体系结构设计及其在数据中心和云计算的应用Motivation100001000Processor10010Memory1198519901995200020052010Want memory to appear:AsfastasCPU As large as required by all of the running applications

高级计算机体系结构设计及其在数据中心和云计算的应用 1 10 100 1000 10000 1985 1990 1995 2000 2005 2010 Performance Motivation • Want memory to appear: – As fast as CPU – As large as required by all of the running applications Processor Memory

高级计算机体系结构设计及其在数据中心和云计算的应用Storage HierarchyMake common casefast:-Common:temporal&spatial locality-Fast:smallermoreexpensivememoryRegistersControlledBiggerTransfersMoreBandwidthby HardwareFasterLargerCaches(SRAM)Controlledby SoftwareCheaper.(OS)Memory(DRAM)[SSD?(Flash)]Disk(Magnetic Media)WhatisS(tatic)RAMvsD(dynamic)RAM?

高级计算机体系结构设计及其在数据中心和云计算的应用 Storage Hierarchy • Make common case fast: – Common: temporal & spatial locality – Fast: smaller more expensive memory Controlled by Hardware Controlled by Software (OS) Bigger Transfers Larger Cheaper More Bandwidth Faster Registers Caches (SRAM) Memory (DRAM) [SSD? (Flash)] Disk (Magnetic Media) What is S(tatic)RAM vs D(dynamic)RAM?

高级计算机体系结构设计及其在数据中心和云计算的应用CachesAn automatically managed hierarchyCoreBreak memory into blocks (several bytes)andtransferdatato/fromcacheinblocks$spatial localityMemoryKeep recentlyaccessed blocks-temporallocality

高级计算机体系结构设计及其在数据中心和云计算的应用 Caches • An automatically managed hierarchy • Break memory into blocks (several bytes) and transfer data to/from cache in blocks – spatial locality • Keep recently accessed blocks – temporal locality Core $ Memory

高级计算机体系结构设计及其在数据中心和云计算的应用Cache Terminologyblock(cacheline):minimumunit thatmaybecachedframe: cache storage location to hold one blockhit: block is found in the cachemiss: block is not found in the cachemiss ratio: fraction of references that misshit time: time to access the cachemiss penalty:time to replace block ona miss

高级计算机体系结构设计及其在数据中心和云计算的应用 Cache Terminology • block (cache line): minimum unit that may be cached • frame: cache storage location to hold one block • hit: block is found in the cache • miss: block is not found in the cache • miss ratio: fraction of references that miss • hit time: time to access the cache • miss penalty: time to replace block on a miss

高级计算机体系结构设计及其在数据中心和云计算的应用Cache ExampleAddress sequence from core:Core(assume8-bytelines)Miss0x10000Oxlo00o (...data...Hit0x100040x10008(..data...)Miss0x101200x10120(...data...)Miss0x10008Hit0x10124Hit0x10004MemoryFinalmissratiois50%

高级计算机体系结构设计及其在数据中心和云计算的应用 Miss Cache Example • Address sequence from core: (assume 8-byte lines) Memory 0x10000 (.data.) 0x10120 (.data.) Hit 0x10008 (.data.) Miss Miss Hit Hit Final miss ratio is 50% Core 0x10000 0x10004 0x10120 0x10008 0x10124 0x10004

高级计算机体系结构设计及其在数据中心和云计算的应用AMAT (1/2)Verypowerful tooltoestimateperformanceIf..cache hit is 10 cycles (core to L1 and back)memory access is 100 cycles (core to mem and back)Then...at 50% miss ratio, avg. access: 0.5x10+0.5x100 = 55at 10% miss ratio, avg. access: 0.9x10+0.1x100 = 19at 1% miss ratio, avg. access: 0.99x10+0.01x100 ~ 11

高级计算机体系结构设计及其在数据中心和云计算的应用 AMAT (1/2) • Very powerful tool to estimate performance • If . cache hit is 10 cycles (core to L1 and back) memory access is 100 cycles (core to mem and back) • Then . at 50% miss ratio, avg. access: 0.5×10+0.5×100 = 55 at 10% miss ratio, avg. access: 0.9×10+0.1×100 = 19 at 1% miss ratio, avg. access: 0.99×10+0.01×100 ≈ 11

高级计算机体系结构设计及其在数据中心和云计算的应用AMAT (2/2). Generalizes nicely to any-depth hierarchyIf...L1 cache hit is 5 cycles (core to L1 and back)L2 cache hit is 20 cycles (core to L2 and back)memory access is 100 cycles (core to mem and back).Then...at 20% miss ratio in L1 and 40% miss ratio in L2 ..avg. access: 0.8x5+0.2×(0.6x20+0.4x100) ~ 14

高级计算机体系结构设计及其在数据中心和云计算的应用 AMAT (2/2) • Generalizes nicely to any-depth hierarchy • If . L1 cache hit is 5 cycles (core to L1 and back) L2 cache hit is 20 cycles (core to L2 and back) memory access is 100 cycles (core to mem and back) • Then . at 20% miss ratio in L1 and 40% miss ratio in L2 . avg. access: 0.8×5+0.2×(0.6×20+0.4×100) ≈ 14

高级计算机体系结构设计及其在数据中心和云计算的应用Memory Organization (1/3)ProcessorRegistersI-TLBLID-CacheD-TLBLII-CacheL2CacheL3Cache (LLC)MainMemory (DRAM)

高级计算机体系结构设计及其在数据中心和云计算的应用 Processor Memory Organization (1/3) Registers L1 I-Cache L1 D-Cache L2 Cache I-TLB D-TLB Main Memory (DRAM) L3 Cache (LLC)

共44页，可试读15页，点击继续阅读 ↓

刷新页面下载完整文档

VIP每日下载上限内不扣除下载券和下载次数；
按次数下载不扣除下载券；
注册用户24小时内重复下载只扣除一次；
顺序：VIP每日次数-->可用次数-->下载券；

点击下载完整版文档（PDF）