中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet

Practical vectorization Practical vectorization Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2022 1/50 S.Ponce-CERN
Practical vectorization 1 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Practical vectorization S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2022

Practical vectorization Outline Introduction ② Measuring vectorization Vectorization Prerequisite Vectorizing techniques in C++ ●Autovectorization oInline assembly o Intrinsics oCompiler extensions oLibraries What to expect 2/50 S.Ponce-CERN
Practical vectorization 2 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Outline 1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ Autovectorization Inline assembly Intrinsics Compiler extensions Libraries 5 What to expect ?

Practical vectorization 4心,ntro Meature Peeeg Techniques Expectat66 Introduction Introduction Measuring vectorization Vectorization Prerequisite Vectorizing techniques in C+ What to e色pect? 3/50 S.Ponce-CERN
Practical vectorization 3 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Introduction 1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ 5 What to expect ?

Practical vectorization Intro Meature Feeeg Technigues Expe Goal of this course Make the theory explained by Andrzej concerning SIMD and vectorization more concrete o Detail the impact of vectorization on your code on your data model 。on actual C++code Give an idea of what to expect from vectorized code 4/50 S.Ponce-CERN
Practical vectorization 4 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Goal of this course Make the theory explained by Andrzej concerning SIMD and vectorization more concrete Detail the impact of vectorization on your code on your data model on actual C++code Give an idea of what to expect from vectorized code

Practical vectorization Intro SIMD Single Instruction Multiple Data Concept o Run the same operation in parallel on multiple data o Operation is as fast as in single data case oThe data leave in a "vector" Practically A B R +回=风 A2 B2 R2 → A B3 R3 A B R4 5/50 S.Ponce-CERN
Practical vectorization 5 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations SIMD - Single Instruction Multiple Data Concept Run the same operation in parallel on multiple data Operation is as fast as in single data case The data leave in a “vector” Practically A + B = R A 1 A 2 A 3 A 4 + B 1 B 2 B 3 B 4 = R 1 R 2 R 3 R 4

Practical vectorization Intro Promises of vectorization Theoretical gains Computation speed up corresponding to vector width o Note that it's dependant on the type of data ◆float vs double shorts versus ints Various units for various vector width Name Arch nb bits nb floats/int nb doubles/long SSEI 4 X86 128 4 2 AVX2 X86 256 8 4 AVX2 2(FMA) X86 256 8 4 AVX2 512 X86 512 16 8 SVE3 ARM 128-2048 464 2-32 1 Streaming SIMD Extensions2 Advanced Vector eXtension3 Scalable Vector Extension 6/50 S.Ponce-CERN
Practical vectorization 6 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Promises of vectorization Theoretical gains Computation speed up corresponding to vector width Note that it’s dependant on the type of data float vs double shorts versus ints Various units for various vector width Name Arch nb bits nb floats/int nb doubles/long SSE1 4 X86 128 4 2 AVX2 X86 256 8 4 AVX2 2 (FMA) X86 256 8 4 AVX2 512 X86 512 16 8 SVE3 ARM 128-2048 4-64 2-32 1 Streaming SIMD Extensions 2 Advanced Vector eXtension 3 Scalable Vector Extension

Practical vectorization ntro How to now what you can use Manually Look for sse,avx,etc in your processor flags 1scpu I egrep mmxlsselavx' Flags:fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts 7/50 S.Ponce·CERN
Practical vectorization 7 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations How to now what you can use Manually Look for sse, avx, etc in your processor flags lscpu | egrep ``mmx|sse|avx'' Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts

Practical vectorization Intro Situation for Intel processors Nehalem (2009). Sandy Bridge (2012):Haswell (2014): Knights Corner Knights Landing Skylake (2017): Westmere (2010): Itel Xeon Intel Xeon (2012 2016年 Intel Xeon Scalable Intel Xeon Processor Intel Xeon Phi Intel Xeon Phi Processor Family Processoes E3E$futily E3 v3/E5 V3/E7v3 Coprocessor x100 Precessoe x200 (legacy) AVX-512VL AVX-512DQ Ivy Bridge (2013): Broadwe2015 AVX-512BW Ietel Xeon Intel Xeon 512-bit Processor Procecor 512-bit E3 V2/E5 V2/E7 v2 E34E5v4E74 AVX-512ER Family AVX-512PF AVX-512CD AVX-512CD 512-bit AVX-512F AVX-512F 256-6it IMCI 256-bit AVX2 AVX2 AVX2 128-bit AVX AVX AVX AVX SSE* SSE* SSE* SSE SSE primary instraction set 8/50 S.Ponce-CERN
Practical vectorization 8 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Situation for Intel processors

Practical vectorization 花5 Measuring vectorization Introduction 2 Measuring vectorization Vectorization Prerequisite Vectorizing techniques in C+ What to e色pect? 9/50 S.Ponce-CERN
Practical vectorization 9 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Measuring vectorization 1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ 5 What to expect ?

Practical vectorization Intro Measure Techniques Am I using vector registers Yes you are As vector registers are used for scalar operations o Remember Andrzej's picture Wasted pasn Am I efficiently using vector registers o Here we have to look at the generated assembly code Looking for specific intructions oOr for the use of specific names of registers 10/50 S.Ponce-CERN
Practical vectorization 10 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Am I using vector registers ? Yes you are As vector registers are used for scalar operations Remember Andrzej’s picture Wasted Used Am I efficiently using vector registers ? Here we have to look at the generated assembly code Looking for specific intructions Or for the use of specific names of registers
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(booklet).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(pres).pdf
- 中国科学院高能所计算中心:数据技术上机 Data Technologies – CERN School of Computing 2019.pdf
- 中国科学院高能所计算中心:数据技术课程 CSC 2018 Data Technologies Exercises(CSC DT 2018 Introduction).pdf
- 中国科学院高能所计算中心:高能物理数据的存储和管理(汪璐).pptx
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第九章 排序.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第七章 搜索结构.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第五章 树.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第四章 数组、串与广义表.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第三章 栈和队列.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第二章 线性表.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第十章 文件、外部排序与外部搜索.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第一章 绪论.ppt
- 计算机系统结构课程教材:计算机科学丛书《深入理解计算机系统》【兰德尔E.布莱恩特、大卫R.奥哈拉伦】原书第三版(中文版)PDF电子书(共十二章)Computer Systems A Programmer's Perspective.pdf
- 上海交通大学:《高级计算机系统结构》课程教学资源(讲稿).pdf
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第09章 新型计算机病毒.ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第08章 移动智能终端恶意代码.ppt
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-booklet.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第1章 绪论(许录平).pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第2章 数字图像处理基础.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第3章 图像变换.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第4章 图像增强.pdf