电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 08 Parallel Sparse Methods

Lecture 8:Parallel Sparse Methods 1
Lecture 8: Parallel Sparse Methods 1

Objective To learn to regularize irregular data with Limiting variations with clamping Sorting Transposition To learn to write a high-performance SpMV kernel based on JDS transposed format To learn the key techniques for compacting input data in parallel sparse methods for reduced consumption of memory bandwidth Better utilization of on-chip memory Fewer bytes transferred to on-chip memory Better utilization of global memory Challenge:retaining regularity 2
Objective • To learn to regularize irregular data with – Limiting variations with clamping – Sorting – Transposition • To learn to write a high-performance SpMV kernel based on JDS transposed format • To learn the key techniques for compacting input data in parallel sparse methods for reduced consumption of memory bandwidth – Better utilization of on-chip memory – Fewer bytes transferred to on-chip memory – Better utilization of global memory – Challenge: retaining regularity 2

Sparse Matrix Many real-world systems are sparse in nature Linear systems described as sparse matrices Solving sparse linear systems Traditional inversion algorithms such as Gaussian elimination can create too many "fill-in"elements and explode the size of the matrix Iterative Conjugate Gradient solvers based on sparse matrix-vector multiplication is preferred Solution of PDE systems can be formulated into linear operations expressed as sparse matrix- vector multiplication 3
Sparse Matrix • Many real-world systems are sparse in nature – Linear systems described as sparse matrices • Solving sparse linear systems – Traditional inversion algorithms such as Gaussian elimination can create too many “fill-in” elements and explode the size of the matrix – Iterative Conjugate Gradient solvers based on sparse matrix-vector multiplication is preferred • Solution of PDE systems can be formulated into linear operations expressed as sparse matrixvector multiplication 3

Sparse Data Motivation for Compaction Many real-world inputs are sparse/non-uniform Signal samples, mesh models, transportation networks, communication networks,etc. 4
Sparse Data Motivation for Compaction ▪ Many real-world inputs are sparse/non-uniform ▪ Signal samples, mesh models, transportation networks, communication networks, etc. 4

Sparse Matrix in Analytics and Al Recommender systems Natural language processing Predict missing ratings Latent semantic model Group similar users/items Word embedding as input to DNN Complex network Matrix Deep learning Link prediction Factorization Model compression Vertices clustering Embedding layer Web search Tensor decomposition Match query and document In machine learning and HPC applications items X Ratings (R) sasn R NETFLIX n items amazon.com Quora Music Spotify 5
Sparse Matrix in Analytics and AI Predict missing ratings Group similar users/items Match query and document In machine learning and HPC applications Matrix Link prediction Factorization Vertices clustering Latent semantic model Word embedding as input to DNN Recommender systems Complex network Web search Natural language processing Tensor decomposition Model compression Embedding layer Deep learning Ratings (R) n items m users * * * * * * * * ≈ x Users items T x T u v X f f R 5

Science Area Number Codes Struct Unstruct Dense Sparse N- Monte FFT PIC Sig of Grids Grids Matrix Matrix Body Carlo 1/o Teams Climate and 3 CESM,GCRM, X X X Weather CM1/WRF.HOMME Plasmas/ 2 H3D(M),VPIC X X X X Magnetosphere OSIRIS,Magtail/UPIC Stellar 5 PPM,MAESTRO X X X X X X Atmospheres and CASTRO,SEDONA, Supernovae ChaNGa,MS-FLUKSS Cosmology 2 Enzo,pGADGET X Combustion/ 2 PSDNS,DISTUF X Turbulence General Relativity 2 Cactus,Harm3D. LazEV Molecular AMBER,Gromacs, X Dynamics NAMD,LAMMPS Quantum Chemistry 2 SIAL,GAMESS, NWChem Material Science 3 NEMOS,OMEN,GW, X QMCPACK Earthquakes/ 2 AWP-ODC X Seismology HERCULES,PLSQR, SPECFEM3D Quantum Chromo 1 Chroma,MILC, X X Dynamics USQCD Social Networks EPISIMDEMICS Evolution Eve Engineering/System GRIPS,Revisit X of Systems 6 Computer Science X X X X ×
Science Area Number of Teams Codes Struct Grids Unstruct Grids Dense Matrix Sparse Matrix N- Body Monte Carlo FFT PIC Sig I/O Climate and Weather 3 CESM, GCRM, CM1/WRF, HOMME X X X X X Plasmas/ Magnetosphere 2 H3D(M),VPIC, OSIRIS, Magtail/UPIC X X X X Stellar Atmospheres and Supernovae 5 PPM, MAESTRO, CASTRO, SEDONA, ChaNGa, MS-FLUKSS X X X X X X Cosmology 2 Enzo, pGADGET X X X Combustion/ Turbulence 2 PSDNS, DISTUF X X General Relativity 2 Cactus, Harm3D, LazEV X X Molecular Dynamics 4 AMBER, Gromacs, NAMD, LAMMPS X X X Quantum Chemistry 2 SIAL, GAMESS, NWChem X X X X X Material Science 3 NEMOS, OMEN, GW, QMCPACK X X X X Earthquakes/ Seismology 2 AWP-ODC, HERCULES, PLSQR, SPECFEM3D X X X X Quantum Chromo Dynamics 1 Chroma, MILC, USQCD X X X Social Networks 1 EPISIMDEMICS Evolution 1 Eve Engineering/System of Systems 1 GRIPS,Revisit X Computer Science 1 X X X X X 6

Sparse Matrix-Vector Multiplication (SpMV) X 十 A X Y Y 7
× A X + Y Y Sparse Matrix-Vector Multiplication (SpMV) 7

Challenges Compared to dense matrix multiplication,SpMV Is irregular/unstructured Has little input data reuse Key to maximal performance Maximize regularity(by reducing divergence and load imbalance) Maximize DRAM burst utilization (layout arrangement) 8
Challenges • Compared to dense matrix multiplication, SpMV – Is irregular/unstructured – Has little input data reuse • Key to maximal performance – Maximize regularity (by reducing divergence and load imbalance) – Maximize DRAM burst utilization (layout arrangement) 8

A Simple Parallel SpMV Row 0 3 0 1 0 Thread 0 Row 1 0 0 0 0 Thread 1 Row 2 0 2 4 1 Thread 2 Row 3 0 0 1 Thread 3 ·Each thread processes one row 9
Row 0 3 0 1 0 Thread 0 Row 1 0 0 0 0 Thread 1 Row 2 0 2 4 1 Thread 2 Row 3 1 0 0 1 Thread 3 A Simple Parallel SpMV • Each thread processes one row 9

Compressed Sparse Row(CSR) Format CSR Representation Row 0 Row 2 Row 3 Nonzero values data[7] {3,1, 2,4,1, 1,1} Column indices col _index[7]{0,2,1,2,3, 0,3 Row Pointers ptr[5] {0,2,2,5,7} Dense representation Row 0 3 0 Thread 0 Row 1 0 0 0 0 Thread 1 Row 2 0 2 4 1 Thread 2 Row 3 Thread 3 10
Row 0 Row 2 Row 3 Nonzero values data[7] { 3, 1, 2, 4, 1, 1, 1 } Column indices col_index[7] { 0, 2, 1, 2, 3, 0, 3 } Row Pointers ptr[5] { 0, 2, 2, 5, 7 } Compressed Sparse Row (CSR) Format 10 Row 0 3 0 1 0 Thread 0 Row 1 0 0 0 0 Thread 1 Row 2 0 2 4 1 Thread 2 Row 3 1 0 0 1 Thread 3 Dense representation CSR Representation
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 07 JOINT CUDA-MPI PROGRAMMING.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 06 PARALLEL COMPUTATION PATTERNS(SCAN).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 05 PARALLEL COMPUTATION PATTERNS(HISTOGRAM).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 04 Performance considerations.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 03 MEMORY AND DATA LOCALITY.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 02 CUDA PARALLELISM MODEL.pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 01 Introduction To Cuda C.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)NVIDIA CUDA C Programming Guide(Design Guide,June 2017).pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Methods of conjugate gradients for solving linear systems.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)NVIDIA Parallel Prefix Sum(Scan)with CUDA(April 2007).pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Single-pass Parallel Prefix Scan with Decoupled Look-back.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Program Optimization Space Pruning for a Multithreaded GPU.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Some Computer Organizations and Their Effectiveness.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)Software and the Concurrency Revolution.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems.pdf
- 《GPU并行编程 GPU Parallel Programming》课程教学资源(参考文献)MPI A Message-Passing Interface Standard(Version 2.2).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)19 Firewall Design Methods.pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)18 Web Security(SQL Injection and Cross-Site Request Forgery).pdf
- 南京大学:《网络安全与入侵检测 Network Security and Intrusion Detection》课程教学资源(课件讲稿)17 Web Security(Cookies and Cross Site Scripting,XSS).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 09 Parallel patterns(MERGE SORT).pdf
- 电子科技大学:《GPU并行编程 GPU Parallel Programming》课程教学资源(课件讲稿)Lecture 10 Computational Thinking.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)课程简介(杜平安).pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第一章 绪论.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二章 有限元法的基本原理(平面问题有限元法).pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第七章 动态分析有限元法 FEM of Dynamic Analysis.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第3~6章 其他问题有限元法.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第八章 热分析有限元法 FEM of Thermal Analysis.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十二章 有限元建模概述 Overview of Finite Element Modeling.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十一章 有限元建模的基本原则.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十四章 几何模型的建立.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十五章 单元类型及特性定义.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十六章 网格划分方法.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十七章 模型检查与处理 Model Checking and Processing.pdf
- 电子科技大学:《有限元理论与建模方法 Finite Element Analysis and Modeling》研究生课程教学资源(课件讲稿)第二篇 有限元建模方法 第十八章 边界条件的建立 Creation of Boundary Condition.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Fingerprinting.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Greedy and Local Search.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Balls into Bins.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Concentration of Measure.pdf
- 南京大学:《高级算法 Advanced Algorithms》课程教学资源(课件讲稿)Introduction(Min-Cut and Max-Cut,尹⼀通).pdf