中国科学技术大学:《并行计算 Parallel Computing》课程教学资源_Part III Parallel Programming Models

Part Ill: Parallel Programming models The Implicit Model 1. Basic Concept: With this approach, programmers write codes using a familiar sequential programming language, and then compiler is responsible to convert automatically it into a parallel codes(ex. KAP from kuck and Associates FORGE from Advanced Parallel Research) 2. Features Simpler semantics: no deadlock; always determinate Better portability due to sequential program Single thread of control makes testing, debugging correctness verification easier Disadvantages: Extremely difficult to develop autoparallel compiler; Autopar always is low-efficiency NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Implicit Model Part III:Parallel Programming Models 3 - 1

Part Ill: Parallel Programming models The data-Parallel model 1. Basic Concept: Data-Parallel model is the native model for SIMD machines. Data parallel programming emphasizes local computations and data routing operations. It can be implemented either on SIMD or on SPMD. Fortran90 and HPF are examples Features Single thread: as far as control fow is concerned. a data parallel program is just like a sequential program Parallel synchronous operation on large data structure (ex. Array etc. Loosely synchronous there is a synchronization after every statement Single address space: all variables reside in a single address space Explicit data allocation: users allocating data may reduce communication overhead Implicit communication: users don't have to specify communication operations NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Data-Parallel Model Part III:Parallel Programming Models 3 - 2

Part Ill: Parallel Programming models The Shared-Variable model 1. Basic Concept: The shared-variable programming is the native model for PVP, SMP and DSM machines. THere is an ANSI X3H5 standard. The portability of programs is problematic 2. Features Multiple threads: A shared variable program uses either SPMD(Single- Program-Multiple-Data)or MPMD (Multiple-Program-Multiple-Data) Asynchronous: Each process executes at its on pace Explicit synchronization: special synchronous operations(barrier, lock, critical region, event)are use Single address space: all variables reside in a single address space. Implicit data and computation distribution: because data can be considered in SM. there is no need to explicitly distribute data and computation. Implicit communication: communication is done implicitly through reading/ writing of shared variables NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Shared-Variable Model Part III:Parallel Programming Models 3 - 3

Part Ill: Parallel Programming models The message-Passing model 1. Basic Concept: The message passing programming is the native model for MPP, Cow. The portability of programs is enchanced greatly by PVM and MPI libraries 2. Features Multiple threads: A message passing program uses either SPMD ( Single- Program-Multiple-Data)or MPMD (Multiple-Program-Multiple-Data Asynchronous operations at different nodes. Explicit synchronization: special synchronous operations(barrier, lock, critical region, event)are used Multiple address space: The processes of a parallel program reside in different address space Explicit data mapping and workload allocation Explicit communication: The processes interact by executing message passing operation NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn The Message-Passing Model Part III:Parallel Programming Models 3 - 4

Part Ill: Parallel Programming models Comparison of Parallel Programming Models 】国3 NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Comparison of Parallel Programming Models Part III:Parallel Programming Models 3 - 5

Part I: Sample Program π Computation g Integration formula of T 4 1+x d=>,+05y2N 1=01+( aA sequential c code to compute T: #definen 1000000 main double local, pi=0.0. w: ng 1, 10/N; for(i=0i<N计i++){ local=(+0.5)*w; pi=pi+4.0/(1.0+local*local) printf("pi is %f、n”pi*w) NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn p Computation + Integration formula of p : + A sequential C code to compute p : #define N 1000000 main(){ double local,pi=0.0,w; long i; w=1.0/N; for (i=0;i<N;i++){ local=(i+0.5)*w; pi=pi+4.0/(1.0+local*local); } printf(“pi is %f\n”,pi*w); } Part III:Sample Program 4 3 2 1 0 1 ò å - = ´ + + » + = 1 0 1 0 2 2 1 ) 0.5 1 ( 4 1 4 N i N N i dx x p 3 - 6

Part Ill: Shared-Memory Programming Standards ANSI X3H5 c Parallel Construct: Using parallel construct to specify parallelism of X3H5 program Inside a parallel construct includes either parallel block, parallel loop, or single process. program main I The program begins in sequential mode I A is executed by only the base thread parallel I Switch to parallel mode I B is replicated by every team member sections I Starts a parallel block section I One team mem ber executes C section D Another team member executes d ps ections Wait till both C and D are completed using Temporarily switch to sequential mode E E is executed by one team member end singl I Switch back to parallel mode pdo i=1, 6 I Starts a pdo construct 'The team members share the 6 iterations of F end pdo no wait No implicit barrier More replicate code end parallel I Switch back to sequential mode H H is executed by only the initial process I There could be more parallel constructs d Implicit barrier(fence operation: Located parallel,end paralleled section, end pdo, end single forces all memory accesses up to this point to become consistent. Thread interaction and synchronization, including four types of synchronization variables: Latch, Lock, Event and Ordinal NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn ANSI X3H5 + Parallel Construct:Using parallel construct to specify parallelism of X3H5 program. Inside a parallel construct includes either parallel block,parallel loop,or single process. + Implicit barrier(fence operation):Located parallel,end parallel,end psection,end pdo,end psingle forces all memory accesses up to this point to become consistent. + Thread interaction and synchronization,including four types of synchronization variables:Latch,Lock,Event and Ordinal. Part III:Shared-Memory Programming Standards 3 - 7

Part Ill: Shared-Memory Programming Standards POSIX Threads(Pthreads) Pthreads standard was established by ieee standards committee which is similar to Solaris Threads g Thread Management Primitives Function Prototype Meaning int pthread create(pthread t* thread id, pthread attr t*attr, Create a thread void*('mmyroutine)(void*), void*arg) void pthread exit(void*status) A thread exits int pthread join(pthread t thread, void** status) Join a thread pthread t pthread self( void) Retums the calling thread ID g Threads Synchronization Primitives Function Meaning pthread mutex init() Creates a new mutex variable pthread mutex destroy(.) Destroy a mutex variable thread mutex lock(.) Lock(acquire) a mutex variable pthread mutex trylock(.) Try to acquire a mutex variable pthread mutex unlock(.) Unlock(release)a mutex variable pthread cond ini(…) Creates a new conditional variable pthread cond destroy.) Destroy a conditional variable pthread cond wai(…) Wait(block)on a conditional variable pthread_ _cond timedwait() Wait on a conditional variable up to a time limit pthread cond signal(.) Post an event, unlock one waiting process pthread_cond broadcast(.) Post an event, unlock all waiting process NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn POSIX Threads(Pthreads) Pthreads standard was established by IEEE standards committee which is similar to Solaris Threads. + Thread Management Primitives : + Threads Synchronization Primitives : Part III:Shared-Memory Programming Standards 3 - 8

Part Ill: Shared-Memory Programming Standards Shared-Variable Parallel Code to Compute T The following code is a C-like notation: #define N 1000000 maino double local, pi =0.0,w longi; A w=1.0/N B: #pragma parallel #pragma shared( pi, w) #pragma local (i, local #pragma pfor iterate (i=0; N: 1) for(i=0;i<N;i++)& local =(1+0.5)*w: local=4.0/(1.0+ local local #pragma critical pI- PI al; printf("pi is f n", pi *w); 3/* mainO */ NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Shared-Variable Parallel Code to Compute p The following code is a C-like notation : Part III:Shared-Memory Programming Standards 3 - 9

Part Ill: Message Passing Programming MPI: Message Passing Interface E Message-Passing Library approach to parallel programming: A collection of processes executes program written in standard sequential language augmented with calls to library of functions to send/receive massage. K Computation: In MPI programming model, a computation consists of one or more heavy weigh processes that communicate by calling library routines.The number of processes in an MPI computation is normally fixed Communication mechanism a Point-to-point communication operation e Collective communication operations (broadcast, summation etc. E Communicator: to allow the MPI programmers to define modules that allow subprograms to encapsulate communication operations 7 Basic MPI: Although MPI is a complex system including more than 200 functions, we can solve a wide range of problems using just six of its functions Both C language binding and Fortran language binding for MPI. NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn MPI:Message Passing Interface + Message-Passing Library approach to parallel programming:A collection of processes executes program written in standard sequential language augmented with calls to library of functions to send/receive massage. + Computation:In MPI programming model,a computation consists of one or more heavy weigh processes that communicate by calling library routines.The number of processes in an MPI computation is normally fixed. + Communication mechanism : - Point-to-point communication operation. - Collective communication operations (broadcast,summation etc.). + Communicator:to allow the MPI programmers to define modules that allow subprograms to encapsulate communication operations. + Basic MPI:Although MPI is a complex system including more than 200 functions,we can solve a wide range of problems using just six of its functions! + Both C language binding and Fortran language binding for MPI. Part III:Message Passing Programming 3 - 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源_Part I Parallel Computer System Architectures.pdf
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第1章 计算机网络概论(杜煜).ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第9章 Internet及其相关内容.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第2章 数据通信技术的基础知识.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第8章 网络的互连.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第4章 计算机局域网络.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第10章 计算机网络安全.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第7章 网络的计算模式.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第3章 计算机网络体系结构.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第5章 结构化布线系统.ppt
- 高职高专规划教材:《计算机网络基础》课程教学资源(PPT课件)第6章 网络操作系统与网络结构.ppt
- 中国计量大学(中国计量学院):《现场总线技术》课程教学资源(PPT课件)第四课 现场总线的发展趋势.ppt
- 中国计量大学(中国计量学院):《现场总线技术》课程教学资源(PPT课件)第三课 主要现场总线与现场总线控制系统.ppt
- 中国计量大学(中国计量学院):《现场总线技术》课程教学资源(PPT课件)第二课 现场总线简介.ppt
- 中国计量大学(中国计量学院):《现场总线技术》课程教学资源(PPT课件)第一课 计算机网络(概述).ppt
- 《管理信息系统》课程教学资源:PPT课件讲稿(共三章,附案例).ppt
- 计算机应用与维护专业(单招)教学大纲(基础课程、技术课程、选修课).doc
- 《Linux实用教程》书籍配套资源(PPT讲稿)第2章 Linux的常用命令.ppt
- 《Linux实用教程》书籍配套资源(PPT讲稿)第1章 Linux概况及安装(刘兵、吴煜煌).ppt
- 《Linux实用教程》书籍配套资源(PPT讲稿)第9章 Linux编程基础.ppt
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(试卷)并行分布式试卷(一).doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(试卷)并行分布式试卷(二).doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(试卷)并行分布式试卷(三).doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(讲义)各章小结.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(讲义)例题讲解.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第一章 并行计算机系统及其结构模型.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第十章 线性方程组的求解.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第十一章 快速傅里叶变换.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第十二章 并行程序设计基础.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第十三章 共享存储系统并行编程.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第十四章 分布存储系统并行编程.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第十五章 并行程序设计环境与工具.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第二章 当代并行计算机系统介绍.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第三章 并行计算性能评测.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第四章 并行算法的设计基础.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第五章 并行算法的一般设计策略.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第六章 并行算法的基本设计技术.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第七章 并行算法的一般设计过程.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第八章 基本通信操作.doc
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(习题)第九章 稠密矩阵运算.doc