《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor

高级计算机体系结构设计及其在数据中心和云计算的应用Lecture 12Shared-Memory Multi-Processors
高级计算机体系结构设计及其在数据中心和云计算的应用 Lecture 12 Shared-Memory Multi-Processors

高级计算机体系结构设计及其在数据中心和云计算的应用Shared-Memory MultiprocessorsMultiple threads use shared memory (address space)-"SysV Shared Memory" or“"Threads" in softwareCommunication implicitvialoadsandstores- Opposite of explicit message-passing multiprocessorsTheoretical foundation:PRAM modelPAP2Q3P4MemorySystem
高级计算机体系结构设计及其在数据中心和云计算的应用 Shared-Memory Multiprocessors • Multiple threads use shared memory (address space) – “SysV Shared Memory” or “Threads” in software • Communication implicit via loads and stores – Opposite of explicit message-passing multiprocessors • Theoretical foundation: PRAM model P1 P2 P3 P4 Memory System

高级计算机体系结构设计及其在数据中心和云计算的应用Why Shared Memory?Pluses-App seesmultitaskinguniprocessor- os needs only evolutionaryextensions-CommunicationhappenswithoutOs.Minuses-Synchronizationis complex- Communication is implicit (hard to optimize)- Hard to implement (in hardware)Result-SMPsandCMPsaremostsuccessfulmachinestodate-First withmulti-billion-dollarmarkets
高级计算机体系结构设计及其在数据中心和云计算的应用 Why Shared Memory? • Pluses – App sees multitasking uniprocessor – OS needs only evolutionary extensions – Communication happens without OS • Minuses – Synchronization is complex – Communication is implicit (hard to optimize) – Hard to implement (in hardware) • Result – SMPs and CMPs are most successful machines to date – First with multi-billion-dollar markets

高级计算机体系结构设计及其在数据中心和云计算的应用Paired vs. Separate Processor/Memory?Separate CPU/memory· Paired CPU/memory-Uniformmemoryaccess-Non-uniformmemoryaccess(UMA)(NUMA)Equallatencytomemory.Fasterlocalmemory.Data placement matters-Lowpeakperformance- High peak performance[CPU($)CPU(S)CPU(S)CPU(S)CPU(S)CPU($)CPU(S)CPU($)RMemMemRMemRMemRMemMemMemMem
高级计算机体系结构设计及其在数据中心和云计算的应用 Paired vs. Separate Processor/Memory? • Separate CPU/memory – Uniform memory access (UMA) • Equal latency to memory – Low peak performance • Paired CPU/memory – Non-uniform memory access (NUMA) • Faster local memory – Low peak performance • Data placement matters – High peak performance CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) Mem CPU($) R R R Mem R

高级计算机体系结构设计及其在数据中心和云计算的应用Shared vs. Point-to-Point Networks· Shared networkPoint-to-point network:- Example:bus-Example:mesh,ring-Low latency-Highlatency (many“hops")-Lowbandwidth-Higherbandwidth.Doesn't scale >~16 cores: Scales to 1000s of cores-Simplecachecoherence-ComplexcachecoherenceCPU($)CPU(S)CPU($)CPU(S)CPU($)CPU($)MemRMemRMemRMemRRMemMemRMemRRMemCPU(S)CPU(S)
高级计算机体系结构设计及其在数据中心和云计算的应用 Shared vs. Point-to-Point Networks • Shared network – Example: bus – Low latency – Low bandwidth • Point-to-point network: – Example: mesh, ring – High latency (many “hops”) – Higher bandwidth • Doesn’t scale >~16 cores – Simple cache coherence • Scales to 1000s of cores – Complex cache coherence CPU($) Mem CPU($) Mem R CPU($) Mem R CPU($) R Mem CPU($) R Mem CPU($) Mem CPU($) Mem CPU($) R R R Mem R

高级计算机体系结构设计及其在数据中心和云计算的应用Organizing Point-To-Point NetworksNetwork topology: organization of network- Tradeoff perf. (connectivity, latency, bandwidth)<> costRouterchips-Networks w/separate router chips areindirect-Networksw/processor/memory/routerinchiparedirectFewercomponents,"GluelessMp"RCPU(S)CPU($)MemRRMemRRMemMemMemRMemMemRMemRRRRCPU(S)CPU(S)CPU(S)CPU($)CPU(S)CPU($)
高级计算机体系结构设计及其在数据中心和云计算的应用 Organizing Point-To-Point Networks • Network topology: organization of network – Tradeoff perf. (connectivity, latency, bandwidth) cost • Router chips – Networks w/separate router chips are indirect – Networks w/ processor/memory/router in chip are direct • Fewer components, “Glueless MP” CPU($) Mem CPU($) Mem CPU($) Mem CPU($) R R R Mem R R R R CPU($) Mem R CPU($) Mem R CPU($) R Mem CPU($) R Mem

高级计算机体系结构设计及其在数据中心和云计算的应用Issues for Shared Memory SystemsTwo big ones-Cachecoherence-Memoryconsistency modelClosely relatedOften confused
高级计算机体系结构设计及其在数据中心和云计算的应用 Issues for Shared Memory Systems • Two big ones – Cache coherence – Memory consistency model • Closely related • Often confused

高级计算机体系结构设计及其在数据中心和云计算的应用Cache Coherence: The Problem (1/2)Variable A initiallyhas valueOP1 stores value 1 into AP2 loads A from memory and sees old value 0P1P2t1: Store A=1t2: Load A?A:0.1L1L1BusA:0MainMemoryNeedto do something to keep P2's cache coherent
高级计算机体系结构设计及其在数据中心和云计算的应用 Cache Coherence: The Problem (1/2) • Variable A initially has value 0 • P1 stores value 1 into A • P2 loads A from memory and sees old value 0 P1 t1: Store A=1 P2 t2: Load A? A: 0 Bus t1: Store A=1 A: 0 A: 0 1 A: 0 Main Memory L1 t2: Load A? L1 Need to do something to keep P2’s cache coherent

高级计算机体系结构设计及其在数据中心和云计算的应用Cache Coherence: The Problem (2/2)P1 and P2 have variable A (value O) in their cachesP1 stores value 1 into AP2 loads A from its cache and sees old value 0P1P2t1: Store A=1t2: Load A?A: 0A: 0.1L1L1BusA:0MainMemoryNeedto do something to keep P2's cache coherent
高级计算机体系结构设计及其在数据中心和云计算的应用 Cache Coherence: The Problem (2/2) • P1 and P2 have variable A (value 0) in their caches • P1 stores value 1 into A • P2 loads A from its cache and sees old value 0 P1 t1: Store A=1 P2 t2: Load A? A: 0 Bus t1: Store A=1 A: 0 A: 0 1 A: 0 Main Memory L1 t2: Load A? L1 Need to do something to keep P2’s cache coherent

高级计算机体系结构设计及其在数据中心和云计算的应用Approaches to Cache CoherenceSoftware-basedsolutions- Mechanisms::Mark cacheblocks/memorypages as cacheable/non-cacheable·Add“Flush"and"Invalidate"instructions-Couldbedonebycompilerorrun-timesystem- Difficult to get perfect (e.g., what about memory aliasing?)Hardware solutions are far more common-Systemensureseveryonealwaysseesthelatestvalue
高级计算机体系结构设计及其在数据中心和云计算的应用 Approaches to Cache Coherence • Software-based solutions – Mechanisms: • Mark cache blocks/memory pages as cacheable/non-cacheable • Add “Flush” and “Invalidate” instructions – Could be done by compiler or run-time system – Difficult to get perfect (e.g., what about memory aliasing?) • Hardware solutions are far more common – System ensures everyone always sees the latest value
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 1 Instruction Set Architecture(Introduction).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 0 Introduction and Performance Evaluation.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 3 Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 2 Instruction Set Architecture(Microarchitecture Implementation).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 7 Multiprocessors.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 6 Memory Hierarchy and Cache.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 5 Out of Order Execution.pdf
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第4章 基于统计决策的概率分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第1章 绪论、第2章 聚类分析.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第3章 判别函数及几何分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第7章 模糊模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第6章 句法模式识别.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第5章 特征选择与特征提取.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第8章 神经网络模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(实验指导,共五个实验).pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第8章 神经网络在模式识别中的应用.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第7章 模糊模式识别.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第6章 特征提取与选择.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第5章 聚类分析.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 15 GPGPU Architecture and Programming Paradigm.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 11 Multi-core and Multi-threading.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 10 Out of Order and Speculative Execution.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 13 An Introduction to Cloud Data Centers.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 09 Case Study- Jave Branch Prediction Optimization.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 07 Instruction Decode.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 08 Instruction Fetch and Branch Predictioin.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 06 Scoreboarding and Tomasulo.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 05 Core Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 03 Main Memory and DRAM.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 01 Introduction and Performance Evaluation-new.pdf
- 东北大学:某学院计算机科学与技术专业《智能信息系统开发》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《软件工程综合实践》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《创新创业设计基础》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《工程领导力》课程教学大纲.pdf
- 东北大学:某学院计算机科学与技术专业《高等数学建模》课程教学大纲(二).pdf
- 东北大学:某学院计算机科学与技术专业《数据库原理》课程教学大纲.pdf