Parallel Algorithms Underlying MPI Implementations
data:image/s3,"s3://crabby-images/6a92d/6a92dedc474be1078f90c0dde4b72c3daf28704c" alt=""
Parallel Algorithms Underlying MPl Implementations
Parallel Algorithms Underlying MPI Implementations
data:image/s3,"s3://crabby-images/99470/9947054e996f7c8ce85e171799e94092fb096f1b" alt=""
Parallel Algorithms Underlying MPI Implementations This chapter looks at a few of the parallel algorithms underlying the implementations of some simple MPI calls The purpose of this is not to teach you how to roll your own"versions of these routines, but rather to help you start thinking about algorithms in a parallel fashion First, the method of recursive halving and doubling, which is the algorithm underlying operations such as broadcasts and reduction operations, is discussed Then, specific examples of parallel algorithms that implement message passing are given
Parallel Algorithms Underlying MPI Implementations • This chapter looks at a few of the parallel algorithms underlying the implementations of some simple MPI calls. • The purpose of this is not to teach you how to "roll your own" versions of these routines, but rather to help you start thinking about algorithms in a parallel fashion. • First, the method of recursive halving and doubling, which is the algorithm underlying operations such as broadcasts and reduction operations, is discussed. • Then, specific examples of parallel algorithms that implement message passing are given
data:image/s3,"s3://crabby-images/0aaa8/0aaa8885e7edd742f70f5c630649b0f7d3ee5b19" alt=""
Recursive Halving and Doubling
Recursive Halving and Doubling
data:image/s3,"s3://crabby-images/0727c/0727c3703c94a6790ff1030b19913b3431b1d6e4" alt=""
Recursive Halving and Doubling To illustrate recursive halving and doubling suppose you have a vector distributed among p processors, and you need the sum of all components of the vector in each processor, i.e a sum reduction One method is to use a tree-based algorithm to compute the sum to a single processor and then broadcast the sum to every processor
Recursive Halving and Doubling • To illustrate recursive halving and doubling, suppose you have a vector distributed among p processors, and you need the sum of all components of the vector in each processor, i.e., a sum reduction. • One method is to use a tree-based algorithm to compute the sum to a single processor and then broadcast the sum to every processor
data:image/s3,"s3://crabby-images/9833a/9833ac7e493fb7464353ff1e75e2f072965773b4" alt=""
I Recursive Halving and Doubling Assume that each processor has formed the partial sum of the components of the vector that it has Step 1: Processor 2 sends its partial sum to processor 1 and processor 1 adds this partial sum to its own. Meanwhile, processor 4 sends its partial sum to processor 3 and processor 3 performs a similar summation Step 2: Processor 3 sends its partial sum, which is now the sum of the components on processors 3 and 4, to processor 1 and processor 1 adds it to its partial sum to get the final sum across all the components of the vector. At each stage of the process, the number of processes doing work is cut in half. The algorithm is depicted in the Figure 13.1 below, where the solid arrow denotes a send operation and the dotted line arrow denotes a receive operation followed by a summation
Recursive Halving and Doubling • Assume that each processor has formed the partial sum of the components of the vector that it has. • Step 1: Processor 2 sends its partial sum to processor 1 and processor 1 adds this partial sum to its own. Meanwhile, processor 4 sends its partial sum to processor 3 and processor 3 performs a similar summation. • Step 2: Processor 3 sends its partial sum, which is now the sum of the components on processors 3 and 4, to processor 1 and processor 1 adds it to its partial sum to get the final sum across all the components of the vector. • At each stage of the process, the number of processes doing work is cut in half. The algorithm is depicted in the Figure 13.1 below, where the solid arrow denotes a send operation and the dotted line arrow denotes a receive operation followed by a summation
data:image/s3,"s3://crabby-images/32512/3251286b1f61477f5b7353f579b0200d939a2af8" alt=""
I Recursive Halving and Doubling p Figure 13.1. Summation in log(N) steps
Recursive Halving and Doubling Figure 13.1. Summation in log(N) steps
data:image/s3,"s3://crabby-images/c62c0/c62c096e035bf782c3bcec054b3113bcb973764d" alt=""
I Recursive Halving and Doubling Step 3: Processor 1 then must broadcast this sum to all other processors. This broadcast operation can be done using the same communication structure as the summation but in reverse. You will see pseudocode for this at the end of this section. Note that if the total number of processors is N, then only 2 log(N)(log base 2) steps are needed to complete the operation There is an even more efficient way to finish the job in only log(N) steps. By way of example, look at the next figure containing 8 processors. At each step, processor i and processor i+k send and receive data in a pairwise fashion and then perform the summation k is iterated from 1 through N/2 in powers of 2. If the total number of processors is N, then log(N) steps are needed. As an exercise, you should write out the necessary pseudocode for this example
Recursive Halving and Doubling • Step 3: Processor 1 then must broadcast this sum to all other processors. This broadcast operation can be done using the same communication structure as the summation, but in reverse. You will see pseudocode for this at the end of this section. Note that if the total number of processors is N, then only 2 log(N) (log base 2) steps are needed to complete the operation. • There is an even more efficient way to finish the job in only log(N) steps. By way of example, look at the next figure containing 8 processors. At each step, processor i and processor i+k send and receive data in a pairwise fashion and then perform the summation. k is iterated from 1 through N/2 in powers of 2. If the total number of processors is N, then log(N) steps are needed. As an exercise, you should write out the necessary pseudocode for this example
data:image/s3,"s3://crabby-images/dfaab/dfaab8001407f177f2429f348ca1878b8de55e9d" alt=""
I Recursive Halving and Doubling pI p2 p3 p4 p5 p6 p7 Qoq9QsQo Figure 13. 2. Summation to all processors in log(N) steps
Recursive Halving and Doubling Figure 13.2. Summation to all processors in log(N) steps
data:image/s3,"s3://crabby-images/bd6dc/bd6dcd40ca805cd4d9528720f99e4a3a04a802ba" alt=""
I Recursive Halving and Doubling What about adding vectors? That is, how do you add several vectors component-wise to get a new vector? The answer is, you employ the method discussed earlier in a component-wise fashion This fascinating way to reduce the communications and to avoid abundant summations is described next. this method utilizes the recursive halving and doubling technique and is illustrated in Figure 13.3
Recursive Halving and Doubling • What about adding vectors? That is, how do you add several vectors component-wise to get a new vector? The answer is, you employ the method discussed earlier in a component-wise fashion. This fascinating way to reduce the communications and to avoid abundant summations is described next. This method utilizes the recursive halving and doubling technique and is illustrated in Figure 13.3
data:image/s3,"s3://crabby-images/593d2/593d2adb47f111b475d500acce98ab7771ce003c" alt=""
I Recursive Halving and Doubling Suppose there are 4 processors and the length of each vector is also 4 Step 1: Processor pO sends the first two components of the vector to rocessor p1, and p1 sends the last two components of the vector to pO. Then po gets the partial sums for the last two components, and p1 gets the partial sums for the first two components. So do p2 and p3 Step 2: Processor po sends the partial sum of the third component to processor p3. Processor p3 then adds to get the total sum of the third component. Similarly, processor 0, 1 and 2 find the total sums of the 4th, 2nd, and 1st components, respectively. Now the sum of the vectors are found and the components are stored in different processors Step 3: Broadcast the result using the reverse of the above communication process
Recursive Halving and Doubling • Suppose there are 4 processors and the length of each vector is also 4. • Step 1: Processor p0 sends the first two components of the vector to processor p1, and p1 sends the last two components of the vector to p0. Then p0 gets the partial sums for the last two components, and p1 gets the partial sums for the first two components. So do p2 and p3. • Step 2: Processor p0 sends the partial sum of the third component to processor p3. Processor p3 then adds to get the total sum of the third component. Similarly, processor 0,1 and 2 find the total sums of the 4th, 2nd, and 1st components, respectively. Now the sum of the vectors are found and the components are stored in different processors. • Step 3: Broadcast the result using the reverse of the above communication process
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《电子商务技术》课程教学资源(PPT课件讲稿)第五章 电子商务安全技术.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第六章 应用层(谢希仁).ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)并发对象 Concurrent Objects.ppt
- 《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第六讲 关系数据理论.ppt
- 《数据结构 Data Structure》课程教学资源(PPT课件讲稿)06 非二叉树 Non-Binary Trees.ppt
- 《数字图像处理》课程教学资源(PPT课件讲稿)第5章 图像复原.ppt
- 《C语言程序设计》课程电子教案(PPT课件讲稿)Chapter 02 用C语言编写程序.ppt
- 山西国际商务职业学院:《数据库应用程序设计》课程教学资源(PPT课件)第三章 数据与数据运算.pps
- 《计算机网络》课程教学资源(PPT课件讲稿)第一章 计算机网络概述.ppt
- 《大学计算机基础》课程教学资源:作业习题.pdf
- 中国医科大学:《计算机网络实用教程》课程教学资源(PPT讲稿)高速局域网技术、交换式局域网技术、虚拟局域网技术、主要的城域网技术.ppt
- 《TCP/IP协议及其应用》课程教学资源(PPT课件讲稿)第3章 IP寻址与地址解析.ppt
- 西安交通大学:《微型计算机接口技术》课程教学资源(PPT课件讲稿)第五章 输入/输出控制接口.ppt
- 嵌入式交叉开发环境的建立(PPT实验讲稿).ppt
- 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 06 Index Compression.ppt
- 清华大学:《计算机导论》课程电子教案(PPT教学课件)第1章 计算机发展简史.ppt
- 《网页设计与制作》课程教学资源(PPT课件讲稿)第一章 HTML基础.ppt
- 北京大学:文本挖掘技术(PPT讲稿)文本分类 Text Categorization.ppt
- 同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)K-means & EM.pptx
- 中国医科大学计算机中心:《虚拟现实与增强现实技术概论》课程教学资源(PPT课件讲稿)第3章 虚拟现实系统的输出设备.pptx
- 中国铁道出版社:《局域网技术与组网工程》课程教学资源(PPT课件讲稿)第5章 Linux网络工程.ppt
- 陕西师范大学:Neural Networks and Fuzzy Systems(PPT讲稿)Chapter 3 NEURONAL DYNAMICS II:ACTIVATION MODELS.ppt
- 《计算机系统安全》课程教学资源(PPT课件讲稿)第六章 访问控制 Access Control.ppt
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第2章 传统加密技术 Classical Encryption Techniques.ppt
- 《计算机数据恢复技术》课程教学资源(PPT课件讲稿)第1章 数据恢复技术概述.ppt
- 北京大学:《高级软件工程》课程教学资源(PPT课件讲稿)第六讲 网络环境中的软件质量.ppt
- 《大学生计算机基础》课程教学资源(PPT讲稿)第三章 字处理软件(Word 2003).ppt
- 中国水利水电出版社:《计算机组装与维护实训教程》课程教学资源(PPT课件讲稿,共九章).ppt
- 上海交通大学:《软件工程 Software Engineering》课程教学资源(PPT课件讲稿)软件开发过程 Software Development Processes.pptx
- 《大型机高级系统管理技术》课程教学资源(PPT课件讲稿)第4章 作业控制子系统.ppt
- 《计算机软件及应用》课程教学资源(PPT课件讲稿)第2章 Photoshop CS入门基础.ppt
- 河南中医药大学(河南中医学院):《计算机文化》课程教学资源(PPT课件讲稿)第二章 计算机的前世今生(主讲:许成刚).ppt
- 西安电子科技大学:《现代密码学》课程教学资源(PPT课件讲稿)第四章 公钥密码(主讲:董庆宽).pptx
- 《管理信息系统原理及开发》课程教学资源(PPT课件讲稿)第3、4讲 管理信息系统的系统设计.ppsx
- 西安电子科技大学:《接入网技术及其应用》课程教学资源(PPT课件讲稿)第6章 接入网应用(徐展琦).ppt
- 《人工智能原理及应用》课程教学大纲 Artificial Intelligence Principles and Applications.doc
- 《知识发现和数据挖掘 Knowledge Discovery and Data Mining》课程教学课件(PPT讲稿)Chapter 10. Cluster Analysis:Basic Concepts and Methods.pptx
- 中国科学技术大学:《信号与图像处理基础 Signal and Image Processing》课程教学资源(PPT课件讲稿)小波分析 Wavelet Analysis(主讲:曹洋).pptx
- 《计算机网络 Computer Networking》课程教学资源(PPT课件讲稿)Chapter 6 无线和移动网络 Wireless and Mobile Networks.ppt
- 《UNIX操作系统基础》课程教学资源(PPT课件讲稿)第三章 UNIX的文件与目录.ppt