《现代计算机体系结构》课程教学课件(留学生版)Lecture 7 Multiprocessors

Computer ArchitectureMultiprocessorsComputerArchitecture
Computer Architecture Computer Architecture Mul0processors

Multiprocessors andIssues in MultiprocessingComputerArchitecture2
Computer Architecture Multiprocessors and Issues in Multiprocessing 2

Flynn's Taxonomy of ComputersMike Flynn,“Very High-Speed Computing Systems,"Proc.of IEEE,1966SISD: Single instruction operates on single data elementSIMD: Single instruction operates on multiple data elements- Array processor-VectorprocessorMISD:Multiple instructions operate on single data element- Closestform:systolicarrayprocessor,streamingprocessorMIMD: Multiple instructions operate on multiple dataelements (multiple instructionstreams)-Multiprocessor-MultithreadedprocessorComputerArchitecture
Computer Architecture Flynn’s Taxonomy of Computers • Mike Flynn, “Very High-Speed Computing Systems, ” Proc. of IEEE, 1966 • SISD: Single instruction operates on single data element • SIMD: Single instruction operates on multiple data elements – Array processor – Vector processor • MISD: Multiple instructions operate on single data element – Closest form: systolic array processor, streaming processor • MIMD: Multiple instructions operate on multiple data elements (multiple instruction streams) – Multiprocessor – Multithreaded processor 3

Why Parallel Computers?Parallelism: Doing multiple things at a timeThings: instructions, operations, tasks. Main Goal- Improve performance (Execution time or task throughput)ExecutiontimeofaprogramgovernedbyAmdahl'sLaw: Other Goals-Reduce power consumption(4N units at freq F/4) consume less power than (N units at freq F)Why?- Improve cost efficiency and scalability, reduce complexityHardertodesignasingleunitthatperformsaswellasNsimplerunits- Improve dependability: Redundant execution in spaceComputerArchitecture
Computer Architecture Why Parallel Computers? • Parallelism: Doing multiple things at a time • Things: instructions, operations, tasks • Main Goal – Improve performance (Execution time or task throughput) • Execution time of a program governed by Amdahl’s Law • Other Goals – Reduce power consumption • (4N units at freq F/4) consume less power than (N units at freq F) • Why? – Improve cost efficiency and scalability, reduce complexity • Harder to design a single unit that performs as well as N simpler units – Improve dependability: Redundant execution in space 4

Types of Parallelism and How to Exploit Them.InstructionLevelParallelism- Different instructions within a stream canbe executed in parallel-Pipelining,out-of-orderexecution,speculative execution,VLIw-Dataflow·DataParallelism- Different pieces of data can be operated on in parallel-SIMD:Vectorprocessing,arrayprocessing- Systolic arrays, streamingprocessorsTaskLevelParallelism- Different"tasks/threads”can beexecuted inparallel- Multithreading- Multiprocessing (multi-core)ComputerArchitecture
Computer Architecture Types of Parallelism and How to Exploit Them • Instruction Level Parallelism – Different instructions within a stream can be executed in parallel – Pipelining, out-of-order execution, speculative execution, VLIW – Dataflow • Data Parallelism – Different pieces of data can be operated on in parallel – SIMD: Vector processing, array processing – Systolic arrays, streaming processors • Task Level Parallelism – Different “tasks/threads” can be executed in parallel – Multithreading – Multiprocessing (multi-core) 5

Task-Level Parallelism: Creating TasksPartition a single problem into multiple related tasks(threads)-Explicitly:Parallelprogramming·EasywhentasksarenaturalintheproblemWeb/databasequeries. Difficult when natural task boundaries are unclear-Transparently/implicitly: Thread level speculation. Partition a single thread speculatively: Run many independent tasks (processes) together- Easy when there are many processes·Batch simulations, different users,cloud computing workloads- Does not improve the performance of a single taskComputerArchitecture
Computer Architecture Task-Level Parallelism: Creating Tasks • Partition a single problem into multiple related tasks (threads) – Explicitly: Parallel programming • Easy when tasks are natural in the problem – Web/database queries • Difficult when natural task boundaries are unclear – Transparently/implicitly: Thread level speculation • Partition a single thread speculatively • Run many independent tasks (processes) together – Easy when there are many processes • Batch simulations, different users, cloud computing workloads – Does not improve the performance of a single task 6

Multiprocessing FundamentalsComputerArchitecture
Computer Architecture Multiprocessing Fundamentals 7

Multiprocessor Types. Loosely coupled multiprocessors- No shared global memory address space-Multicomputernetwork:Network-basedmultiprocessors-Usuallyprogrammedviamessagepassing. Explicit calls (send, receive) for communicationTightly coupled multiprocessors- Shared global memory address space- Traditional multiprocessing: symmetric multiprocessing (SMP):Existingmulti-coreprocessors,multithreadedprocessors- Programming model similar to uniprocessors (i.e.,multitasking uniprocessor)exceptOperations on shared data require synchronizationComputerArchitecture
Computer Architecture Multiprocessor Types • Loosely coupled multiprocessors – No shared global memory address space – Multicomputer network • Network-based multiprocessors – Usually programmed via message passing • Explicit calls (send, receive) for communication • Tightly coupled multiprocessors – Shared global memory address space – Traditional multiprocessing: symmetric multiprocessing (SMP) • Existing multi-core processors, multithreaded processors – Programming model similar to uniprocessors (i.e., multitasking uniprocessor) except • Operations on shared data require synchronization 8

Main Issues in Tightly-Coupled MP. Shared memory synchronization- Locks,atomic operationsCacheconsistency-MorecommonlycalledcachecoherenceOrdering of memory operations- What should the programmer expect the hardware to provide?Resource sharing, contention, partitioning Communication: Interconnection networksLoad imbalanceComputerArchitecture
Computer Architecture Main Issues in Tightly-Coupled MP • Shared memory synchronization – Locks, atomic operations • Cache consistency – More commonly called cache coherence • Ordering of memory operations – What should the programmer expect the hardware to provide? • Resource sharing, contention, partitioning • Communication: Interconnection networks • Load imbalance 9

Aside: Hardware-based Multithreading· Coarse grained-Quantumbased- Event based (switch-on-event multithreading)· Fine grained- Cycleby cycle- Thornton, "CDC 6600: Design of a Computer," 1970.- Burton Smith, “A pipelined, shared resource MIMD computer," ICPP1978..Simultaneous Can dispatch instructions from multiple threads at the same time-Goodfor improving execution unit utilizationComputerArchitecture10
Computer Architecture Aside: Hardware-based Multithreading • Coarse grained – Quantum based – Event based (switch-on-event multithreading) • Fine grained – Cycle by cycle – Thornton, “CDC 6600: Design of a Computer, ” 1970. – Burton Smith, “A pipelined, shared resource MIMD computer, ” ICPP 1978. • Simultaneous – Can dispatch instructions from multiple threads at the same time – Good for improving execution unit utilization 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 4 Spectualtive Execution.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 6 Memory Hierarchy and Cache.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 5 Out of Order Execution.pdf
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第4章 基于统计决策的概率分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第1章 绪论、第2章 聚类分析.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第3章 判别函数及几何分类法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第7章 模糊模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第6章 句法模式识别.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第5章 特征选择与特征提取.ppt
- 武汉理工大学:《模式识别》课程教学资源(PPT课件)第8章 神经网络模式识别法.ppt
- 武汉理工大学:《模式识别》课程教学资源(实验指导,共五个实验).pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第8章 神经网络在模式识别中的应用.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第7章 模糊模式识别.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第6章 特征提取与选择.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第5章 聚类分析.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第4章 非参数判别分类方法.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第3章 概率密度函数的参数估计.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第2章 贝叶斯决策理论.pdf
- 武汉理工大学:《模式识别》课程授课教案(讲义)第1章 绪论.pdf
- 武汉理工大学:《模式识别》课程教学大纲 Pattern Recognition(研究生).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 2 Instruction Set Architecture(Microarchitecture Implementation).pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 3 Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 0 Introduction and Performance Evaluation.pdf
- 《现代计算机体系结构》课程教学课件(留学生版)Lecture 1 Instruction Set Architecture(Introduction).pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 12 Shared Memory Multiprocessor.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 15 GPGPU Architecture and Programming Paradigm.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 14 Towards Renewable Energy Powered Sustainable and Green Cloud Datacenters.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 11 Multi-core and Multi-threading.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 10 Out of Order and Speculative Execution.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 13 An Introduction to Cloud Data Centers.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 09 Case Study- Jave Branch Prediction Optimization.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 07 Instruction Decode.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 08 Instruction Fetch and Branch Predictioin.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 06 Scoreboarding and Tomasulo.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 04 Memory Data Prefetching.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 05 Core Pipelining.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 02 Memory Hierarchy and Caches.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 03 Main Memory and DRAM.pdf
- 《现代计算机体系结构》课程教学课件(英文讲稿)Lecture 01 Introduction and Performance Evaluation-new.pdf
- 东北大学:某学院计算机科学与技术专业《智能信息系统开发》课程教学大纲.pdf