电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 06 PARALLEL COMPUTATION PATTERNS(SCAN)

LECTURE6-PARALLEL COMPUTATION PATTERNS (SCAN)

Prefix Sum A Work-inefficient Scan Kernel A Work-Efficient Parallel Scan Kernel More on Parallel Scan 电子件做女字
2 Prefix Sum A Work-inefficient Scan Kernel A Work-Efficient Parallel Scan Kernel More on Parallel Scan

Objective To master parallel scan (prefix sum)algorithms Frequently used for parallel work assignment and resource allocation A key primitive in many parallel algorithms to convert serial computation into parallel computation A foundational parallel computation pattern Work efficiency in parallel code/algorithms Reading -Mark Harris,Parallel Prefix Sum with CUDA http://http.developer.nvidia.com/GPUGems3/gpugems3 ch39.html 电子科妓女学 O
3 Objective – To master parallel scan (prefix sum) algorithms – Frequently used for parallel work assignment and resource allocation – A key primitive in many parallel algorithms to convert serial computation into parallel computation – A foundational parallel computation pattern – Work efficiency in parallel code/algorithms – Reading –Mark Harris, Parallel Prefix Sum with CUDA – http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html 3

Inclusive Scan (Prefix-Sum)Definition Definition:The scan operation takes a binary associative operator (pronounced as circle plus),and an array of n elements [oxx and returns the array [xo(x0⊕x1),(x0⊕x1⊕..⊕xn-)1 Example:If is addition,then scan operation on the array would return [31704163], [34111115162225]. 电子科效女学 O
4 Inclusive Scan (Prefix-Sum) Definition Definition: The scan operation takes a binary associative operator ⊕ (pronounced as circle plus), and an array of n elements [x0 , x1 , …, xn-1 ], and returns the array [x0 , (x0 ⊕ x1 ), …, (x0 ⊕ x1 ⊕ … ⊕ xn-1 )]. Example: If ⊕ is addition, then scan operation on the array would return [3 1 7 0 4 1 6 3], [3 4 11 11 15 16 22 25]

An Inclusive Scan Application Example Assume that we have a 100-inch sandwich to feed 10 people We know how much each person wants in inches -[35272843081] How do we cut the sandwich quickly? How much will be left? Method 1:cut the sections sequentially:3 inches first,5 inches second,2 inches third,etc. Method 2:calculate prefix sum: -[3,8,10,17,45,49,52,52,60,61](39 inches left) 电子科妓女学 O
5 An Inclusive Scan Application Example – Assume that we have a 100-inch sandwich to feed 10 people – We know how much each person wants in inches – [3 5 2 7 28 4 3 0 8 1] – How do we cut the sandwich quickly? – How much will be left? – Method 1: cut the sections sequentially: 3 inches first, 5 inches second, 2 inches third, etc. – Method 2: calculate prefix sum: – [3, 8, 10, 17, 45, 49, 52, 52, 60, 61] (39 inches left) 5

Typical Applications of Scan Scan is a simple and useful parallel building block -Convert recurrences from sequential: for(j=1;j<n;j++) out[j]out[j-1]f(j); Into parallel: forall(j)temp[j]=f(j)}; scan (out,temp); 电子科妓女学 O
6 Typical Applications of Scan – Scan is a simple and useful parallel building block – Convert recurrences from sequential: for(j=1;j<n;j++) out[j] = out[j-1] + f(j); – Into parallel: forall(j) { temp[j] = f(j) }; scan(out, temp);

Other Applications Assigning camping spots Assigning Farmer's Market spaces 一 Allocating memory to parallel threads Allocating memory buffer space for communication channels 电子科妓女学 O
7 Other Applications – Assigning camping spots – Assigning Farmer’s Market spaces – Allocating memory to parallel threads – Allocating memory buffer space for communication channels – … 7

An Inclusive Sequential Addition Scan Given a sequence [Xo,X1,X2,... Calculate output [yo,y1,y2,...] Such that yo=Xo y1=X0+X1 y2=X0+X1+X2 Using a recursive definition y%=y%-1+X 电子科妓女学 O
8 An Inclusive Sequential Addition Scan Given a sequence [x0 , x1 , x2 , ... ] Calculate output [y0 , y1 , y2 , ... ] Such that y0 = x0 y1 = x0 + x1 y2 = x0 + x1+ x2 … Using a recursive definition yi = yi − 1 + xi 8

A Work Efficient C Implementation y[0]=[0]: for(i=1;i<Max_;i++)y[0=y[i-1]+[0]: Computationally efficient: N additions needed for N elements-O(N)! Only slightly more expensive than sequential reduction. 电子科妓女学 O
9 A Work Efficient C Implementation y[0] = x[0]; for (i = 1; i < Max_i; i++) y[i] = y [i-1] + x[i]; Computationally efficient: N additions needed for N elements - O(N) ! Only slightly more expensive than sequential reduction. 9

A Naive Inclusive Parallel Scan Assign one thread to calculate each y element Have every thread to add up all x elements needed for the y element yo=Xo y1=X0+X1 y2=X0+X1+X2 "Parallel programming is easy as long as you do not care about performance." 电子科妓女学 O
10 A Naïve Inclusive Parallel Scan – Assign one thread to calculate each y element – Have every thread to add up all x elements needed for the y element y0 = x0 y1 = x0 + x1 y2 = x0 + x1+ x2 “Parallel programming is easy as long as you do not care about performance.” 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 05 PARALLEL COMPUTATION PATTERNS(HISTOGRAM).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 04 Performance considerations.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 03 MEMORY AND DATA LOCALITY.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 02 CUDA PARALLELISM MODEL.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 01 Introduction To Cuda C.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)NVIDIA CUDA C Programming Guide(Design Guide,June 2017).pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Methods of conjugate gradients for solving linear systems.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)NVIDIA Parallel Prefix Sum(Scan)with CUDA(April 2007).pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Single-pass Parallel Prefix Scan with Decoupled Look-back.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Program Optimization Space Pruning for a Multithreaded GPU.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Some Computer Organizations and Their Effectiveness.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Software and the Concurrency Revolution.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)MPI A Message-Passing Interface Standard(Version 2.2).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)19 Firewall Design Methods.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)18 Web Security(SQL Injection and Cross-Site Request Forgery).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)17 Web Security(Cookies and Cross Site Scripting,XSS).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)16 Bloom Filter for Network Security.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)15 Bloom Filters and its Variants.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 07 JOINT CUDA-MPI PROGRAMMING.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 08 Parallel Sparse Methods.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 09 Parallel patterns(MERGE SORT).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 10 Computational Thinking.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)课程简介(杜平安).pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第一章 绪论.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二章 有限元法的基本原理(平面问题有限元法).pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第七章 动态分析有限元法 FEM of Dynamic Analysis.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第3~6章 其他问题有限元法.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第八章 热分析有限元法 FEM of Thermal Analysis.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十二章 有限元建模概述 Overview of Finite Element Modeling.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十一章 有限元建模的基本原则.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十四章 几何模型的建立.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十五章 单元类型及特性定义.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十六章 网格划分方法.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十七章 模型检查与处理 Model Checking and Processing.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十八章 边界条件的建立 Creation of Boundary Condition.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Fingerprinting.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Greedy and Local Search.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Balls into Bins.pdf