电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 02 CUDA PARALLELISM MODEL

GPU Teachina Kit LECTURE 2-CUDA PARALLELISM MODEL Multidimensional Kernel Configuration
LECTURE 2 – CUDA PARALLELISM MODEL Multidimensional Kernel Configuration GPU Teaching Kit

Multidimensional Kernel Configuration Color-to-Greyscale Image Processing Example Blur Image Processing Example Universityf Electr Science and TachnoloChina
Multidimensional Kernel Configuration Color-to-Greyscale Image Processing Example Blur Image Processing Example

OBJECTIVE .To understand multidimensional Grids -Multi-dimensional block and thread indices Mapping block/thread indices to data indices 2 电子料发女学 Universityof Electr Science and TachnolopChina O
OBJECTIVE ▪ To understand multidimensional Grids ▪ Multi-dimensional block and thread indices ▪ Mapping block/thread indices to data indices 2

threadldx.x Grid ack{0,0) Block (1.0) B1ock(2,0) threadldx.y 0ck(0.1) Slock (1.1) B0ck(2.1) blockldx.x Block (1.1) blockldx.y blockDimy threadldxy Thread(0.01 Thre3d情,0 Thread(2.0)Tnreod (3.0) blockldx.y blockddx.x blockDim.x+threadidx.x Thread(21)Thread3.1) Th302hrd121 Thread(2.2)Thread (3.2) 电子料皮女学 University of Electraaie Science and Technolory of China O

A MULTI-DIMENSIONAL GRID EXAMPLE host device Grid 1 Block Block (0,0) (0,1) Kernel l Block Block (1,0) (1,1) Block(1,0) .0,0 1.0,1) (1.0,2) 人1,0,3) Thread Thread Thread Thread (0,0,0) (0,0,1) (0.0,2) (0.0,31 Threrd Thread Thread Thread 0,1,0 0,1,1) 01,2) (0,1,3) 电子科烛女学 niversitof Electr Science and TachnoloChina O
host device Kernel 1 Grid 1 Block (0, 0) Block (1, 1) Block (1, 0) Block (0, 1) Grid 2 Block (1,0) Thread Thread (0,0,0) (0,1,3) Thread (0,1,0) Thread (0,1,1) Thread (0,1,2) Thread (0,0,0) Thread (0,0,1) Thread (0,0,2) Thread (0,0,3) (1,0,0) (1,0,1) (1,0,2) (1,0,3) A MULTI-DIMENSIONAL GRID EXAMPLE 5

PROCESSING A PICTURE WITH A 2D GRID 16×16 blocks 62x76 picture 电子料线女学 University of Electreaie Science and Technolory of China O
PROCESSING A PICTURE WITH A 2D GRID

ROW-MAJOR LAYOUT IN C/C++ M Row*Vidth+Col=2*4+1≡9■ M Mi M2 M3 MM 5M6 M Ms Mo M10 Mi1 M12 Mi3 M Mi M Moo Mo1 Mo2 Mos Mio M11M12 Mis M20 M21 M22 M23 M3o M31 M32 M3s Moo Mor Mo2Mo3 Mio M12M13 M20M2,1M2.2M2.3 M30M3,M32M3,3 电子料做女学 Universityof ElectriScience and TachnolofChina O
M0,2 M1,1 M0,0 M0,1 M1,0 M0,3 M1,2 M1,3 M0,0 M0,1 M0,2 M0,3 M1,0 M1,1 M1,2 M1,3 M2,0 M2,1 M2,2 M2,3 M2,0 M2,1 M2,2 M2,3 M3,0 M3,1 M3,2 M3,3 M3,0 M3,1 M3,2 M3,3 M Row*Width+Col = 2*4+1 = 9 M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M ROW-MAJOR LAYOUT IN C/C++

SOURCE CODE OF A PICTUREKERNEL global void PictureKernel(float*d Pin,float*d Pout, int height,int width) { /Calculate the row of the d pin and d Pout element int Row blockIdx.y*blockDim.y threadIdx.y; /Calculate the column of the d pin and d Pout element int Col blockIdx.x*blockDim.x threadIdx.x; /each thread computes one element of d Pout if in range if ((Row height)&&(Col width)){ d Pout [Row*width+Col]2.0*d Pin[Row*width+Col]; 电子料发女学 University of Electreaie Science and Technolory of China O
SOURCE CODE OF A PICTUREKERNEL __global__ void PictureKernel(float* d_Pin, float* d_Pout, int height, int width) { // Calculate the row # of the d_Pin and d_Pout element int Row = blockIdx.y*blockDim.y + threadIdx.y; // Calculate the column # of the d_Pin and d_Pout element int Col = blockIdx.x*blockDim.x + threadIdx.x; // each thread computes one element of d_Pout if in range if ((Row < height) && (Col < width)) { d_Pout[Row*width+Col] = 2.0*d_Pin[Row*width+Col]; } } Scale every pixel value by 2.0

HOST CODE FOR LAUNCHING PICTUREKERNEL /assume that the picture is m x n, //m pixels in y dimension and n pixels in x dimension /input d_Pin has been allocated on and copied to device /output d Pout has been allocated on device dim3 DimGrid(n-1)/16+1,(m-1)/16+1,1); dim3 DimBlock(16,16,1); PictureKernel>>(d_Pin, d_Pout,m,n); 电子料效女学 University of Electricience and TachnolopChina O
HOST CODE FOR LAUNCHING PICTUREKERNEL // assume that the picture is m × n, // m pixels in y dimension and n pixels in x dimension // input d_Pin has been allocated on and copied to device // output d_Pout has been allocated on device … dim3 DimGrid((n-1)/16 + 1, (m-1)/16+1, 1); dim3 DimBlock(16, 16, 1); PictureKernel>>(d_Pin, d_Pout, m, n); …

COVERING A62×I6 PICTURE WITH16×16 BLOCKS 16×16 block Not all threads in a Block will follow the same control flow path. 电子料线女学 Universityof ElectriScience and TachnolopChina O
COVERING A 62×76 PICTURE WITH 16×16 BLOCKS Not all threads in a Block will follow the same control flow path
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 01 Introduction To Cuda C.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)NVIDIA CUDA C Programming Guide(Design Guide,June 2017).pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Methods of conjugate gradients for solving linear systems.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)NVIDIA Parallel Prefix Sum(Scan)with CUDA(April 2007).pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Single-pass Parallel Prefix Scan with Decoupled Look-back.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Program Optimization Space Pruning for a Multithreaded GPU.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Some Computer Organizations and Their Effectiveness.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Software and the Concurrency Revolution.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)MPI A Message-Passing Interface Standard(Version 2.2).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)19 Firewall Design Methods.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)18 Web Security(SQL Injection and Cross-Site Request Forgery).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)17 Web Security(Cookies and Cross Site Scripting,XSS).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)16 Bloom Filter for Network Security.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)15 Bloom Filters and its Variants.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)14 Buffer Overflow Attacks.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)13 Human Authentication.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)12 Secure Socket Layer(SSL)、TLS(Transport Layer Security).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)11 Public-Key Infrastructure.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 03 MEMORY AND DATA LOCALITY.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 04 Performance considerations.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 05 PARALLEL COMPUTATION PATTERNS(HISTOGRAM).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 06 PARALLEL COMPUTATION PATTERNS(SCAN).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 07 JOINT CUDA-MPI PROGRAMMING.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 08 Parallel Sparse Methods.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 09 Parallel patterns(MERGE SORT).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 10 Computational Thinking.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)课程简介(杜平安).pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第一章 绪论.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二章 有限元法的基本原理(平面问题有限元法).pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第七章 动态分析有限元法 FEM of Dynamic Analysis.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第3~6章 其他问题有限元法.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第八章 热分析有限元法 FEM of Thermal Analysis.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十二章 有限元建模概述 Overview of Finite Element Modeling.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十一章 有限元建模的基本原则.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十四章 几何模型的建立.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十五章 单元类型及特性定义.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十六章 网格划分方法.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十七章 模型检查与处理 Model Checking and Processing.pdf