中国高校课件下载中心 》 教学资源 》 大学文库

《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor

文档信息
资源类别:文库
文档格式:PDF
文档页数:40
文件大小:1.42MB
团购合买:点击进入团购
内容简介
《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor
刷新页面文档预览

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 12Shared-Memory Multi-Processors

高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 12 Shared-Memory Multi-Processors

高级计算机体系结构设计及其在数据中心和云计算的应用Shared-Memory MultiprocessorsMultiple threads use shared memory (address space)-"SysV Shared Memory" or“"Threads" in softwareCommunication implicitvialoadsandstores- Opposite of explicit message-passing multiprocessorsTheoretical foundation:PRAM modelPAP2Q3P4MemorySystem

高级计算机体系结构设计及其在数据中心和云计算的应用 Shared-Memory Multiprocessors • Multiple threads use shared memory (address space) – “SysV Shared Memory” or “Threads” in software • Communication implicit via loads and stores – Opposite of explicit message-passing multiprocessors • Theoretical foundation: PRAM model P1 P2 P3 P4 Memory System

高级计算机体系结构设计及其在数据中心和云计算的应用Why Shared Memory?Pluses-App seesmultitaskinguniprocessor- os needs only evolutionaryextensions-CommunicationhappenswithoutOs.Minuses-Synchronizationis complex- Communication is implicit (hard to optimize)- Hard to implement (in hardware)Result-SMPsandCMPsaremostsuccessfulmachinestodate-First withmulti-billion-dollarmarkets

高级计算机体系结构设计及其在数据中心和云计算的应用 Why Shared Memory? • Pluses – App sees multitasking uniprocessor – OS needs only evolutionary extensions – Communication happens without OS • Minuses – Synchronization is complex – Communication is implicit (hard to optimize) – Hard to implement (in hardware) • Result – SMPs and CMPs are most successful machines to date – First with multi-billion-dollar markets

高级计算机体系结构设计及其在数据中心和云计算的应用Paired vs. Separate Processor/Memory?Separate CPU/memory· Paired CPU/memory-Uniformmemoryaccess-Non-uniformmemoryaccess(UMA)(NUMA)Equallatencytomemory.Fasterlocalmemory.Data placement matters-Lowpeakperformance- High peak performance[CPU($)CPU(S)CPU(S)CPU(S)CPU(S)CPU($)CPU(S)CPU($)RMemMemRMemRMemRMemMemMemMem

高级计算机体系结构设计及其在数据中心和云计算的应用 Paired vs. Separate Processor/Memory? • Separate CPU/memory – Uniform memory access (UMA) • Equal latency to memory – Low peak performance • Paired CPU/memory – Non-uniform memory access (NUMA) • Faster local memory – Low peak performance • Data placement matters – High peak performance CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) R R R Mem R

高级计算机体系结构设计及其在数据中心和云计算的应用Shared vs. Point-to-Point Networks· Shared networkPoint-to-point network:- Example:bus-Example:mesh,ring-Low latency-Highlatency (many“hops")-Lowbandwidth-Higherbandwidth.Doesn't scale >~16 cores: Scales to 1000s of cores-Simplecachecoherence-ComplexcachecoherenceCPU($)CPU(S)CPU($)CPU(S)CPU($)CPU($)MemRMemRMemRMemRRMemMemRMemRRMemCPU(S)CPU(S)

高级计算机体系结构设计及其在数据中心和云计算的应用 Shared vs. Point-to-Point Networks • Shared network – Example: bus – Low latency – Low bandwidth • Point-to-point network: – Example: mesh, ring – High latency (many “hops”) – Higher bandwidth • Doesn’t scale >~16 cores – Simple cache coherence • Scales to 1000s of cores – Complex cache coherence CPU($) Mem CPU($) Mem R CPU($) Mem R CPU($) R Mem CPU($) R Mem CPU($) Mem CPU($) Mem CPU($) R R R Mem R

高级计算机体系结构设计及其在数据中心和云计算的应用Organizing Point-To-Point NetworksNetwork topology: organization of network- Tradeoff perf. (connectivity, latency, bandwidth)<> costRouterchips-Networks w/separate router chips areindirect-Networksw/processor/memory/routerinchiparedirectFewercomponents,"GluelessMp"RCPU(S)CPU($)MemRRMemRRMemMemMemRMemMemRMemRRRRCPU(S)CPU(S)CPU(S)CPU($)CPU(S)CPU($)

高级计算机体系结构设计及其在数据中心和云计算的应用 Organizing Point-To-Point Networks • Network topology: organization of network – Tradeoff perf. (connectivity, latency, bandwidth)  cost • Router chips – Networks w/separate router chips are indirect – Networks w/ processor/memory/router in chip are direct • Fewer components, “Glueless MP” CPU($) Mem CPU($) Mem CPU($) Mem CPU($) R R R Mem R R R R CPU($) Mem R CPU($) Mem R CPU($) R Mem CPU($) R Mem

高级计算机体系结构设计及其在数据中心和云计算的应用Issues for Shared Memory SystemsTwo big ones-Cachecoherence-Memoryconsistency modelClosely relatedOften confused

高级计算机体系结构设计及其在数据中心和云计算的应用 Issues for Shared Memory Systems • Two big ones – Cache coherence – Memory consistency model • Closely related • Often confused

高级计算机体系结构设计及其在数据中心和云计算的应用Cache Coherence: The Problem (1/2)Variable A initiallyhas valueOP1 stores value 1 into AP2 loads A from memory and sees old value 0P1P2t1: Store A=1t2: Load A?A:0.1L1L1BusA:0MainMemoryNeedto do something to keep P2's cache coherent

高级计算机体系结构设计及其在数据中心和云计算的应用 Cache Coherence: The Problem (1/2) • Variable A initially has value 0 • P1 stores value 1 into A • P2 loads A from memory and sees old value 0 P1 t1: Store A=1 P2 t2: Load A? A: 0 Bus t1: Store A=1 A: 0 A: 0 1 A: 0 Main Memory L1 t2: Load A? L1 Need to do something to keep P2’s cache coherent

高级计算机体系结构设计及其在数据中心和云计算的应用Cache Coherence: The Problem (2/2)P1 and P2 have variable A (value O) in their cachesP1 stores value 1 into AP2 loads A from its cache and sees old value 0P1P2t1: Store A=1t2: Load A?A: 0A: 0.1L1L1BusA:0MainMemoryNeedto do something to keep P2's cache coherent

高级计算机体系结构设计及其在数据中心和云计算的应用 Cache Coherence: The Problem (2/2) • P1 and P2 have variable A (value 0) in their caches • P1 stores value 1 into A • P2 loads A from its cache and sees old value 0 P1 t1: Store A=1 P2 t2: Load A? A: 0 Bus t1: Store A=1 A: 0 A: 0 1 A: 0 Main Memory L1 t2: Load A? L1 Need to do something to keep P2’s cache coherent

高级计算机体系结构设计及其在数据中心和云计算的应用Approaches to Cache CoherenceSoftware-basedsolutions- Mechanisms::Mark cacheblocks/memorypages as cacheable/non-cacheable·Add“Flush"and"Invalidate"instructions-Couldbedonebycompilerorrun-timesystem- Difficult to get perfect (e.g., what about memory aliasing?)Hardware solutions are far more common-Systemensureseveryonealwaysseesthelatestvalue

高级计算机体系结构设计及其在数据中心和云计算的应用 Approaches to Cache Coherence • Software-based solutions – Mechanisms: • Mark cache blocks/memory pages as cacheable/non-cacheable • Add “Flush” and “Invalidate” instructions – Could be done by compiler or run-time system – Difficult to get perfect (e.g., what about memory aliasing?) • Hardware solutions are far more common – System ensures everyone always sees the latest value

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档