中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-booklet

Structuring data for efficient I/O format compress addr state c/c Structuring data for efficient 1/O Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2017 1/42 S.Ponce-CERN
Structuring data for efficient I/O 1 / 42 S. Ponce - CERN format compress addr state c/c Structuring data for efficient I/O S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2017

Structuring data for efficient l/O format compreas addr state c/c Overall Course Structure Structuring Data for efficient I/O o Data formats,data compression oData addressing Many ways to Store Data o Storage devices and their specificities o Distributing and parallelizing storage Preserving data o Data consistency o Data safety Key ingredients to achieve efficient I/O Synchronous vs asynchronous I/O I/O optimizations and caching 2
Structuring data for efficient I/O 2 / 42 S. Ponce - CERN format compress addr state c/c Overall Course Structure Structuring Data for efficient I/O Data formats, data compression Data addressing Many ways to Store Data Storage devices and their specificities Distributing and parallelizing storage Preserving data Data consistency Data safety Key ingredients to achieve efficient I/O Synchronous vs asynchronous I/O I/O optimizations and caching

Structuring data for efficient I/O format compress addr state c/c Outline ① Data format Row vs Column Compressing data oCompression algorithms Efficiency and use cases Data addressing o Hierarchical namespaces ●Limitations ●Flat namespaces Stateful interfaces ●POSIX ●Limitations o Stateless interfaces Conclusion 3/42 S.Ponce CERN
Structuring data for efficient I/O 3 / 42 S. Ponce - CERN format compress addr state c/c Outline 1 Data format Row vs Column 2 Compressing data Compression algorithms Efficiency and use cases 3 Data addressing Hierarchical namespaces Limitations Flat namespaces 4 Stateful interfaces POSIX Limitations Stateless interfaces 5 Conclusion

Structuring data for efficient I/O format compress addr state c/c Data format 0 Data format o Row vs Column 2 Compressing data Data addressing 年 Stateful interfaces Conclusion row/col 4/42 S.Ponce-CERN
Structuring data for efficient I/O 4 / 42 S. Ponce - CERN format compress addr state c/c row/col Data format 1 Data format Row vs Column 2 Compressing data 3 Data addressing 4 Stateful interfaces 5 Conclusion

Structuring data for efficient I/O format compreas addr state c/c Data structure by example-scenario Scenario o You are measuring temperatures within a piece of detector o You have 10K captors and you take one measure every minute o After a month,you got 432M measures o That is 1.6GB if you take single precision floats(32bits) row/cal 5/42 S.Ponce-CERN
Structuring data for efficient I/O 5 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - scenario Scenario You are measuring temperatures within a piece of detector You have 10K captors and you take one measure every minute After a month, you got 432M measures That is 1.6GB if you take single precision floats (32bits)

Structuring data for efficient I/O format compress addr stats c/e Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order Time (mn) Captor 1 Captor 2 Captor c 0 ao bo 20 1 a1 b1 Z1 n an bn Zn File content a0bo.2oa1b1…z1…anbn.zn o 6/42 S.Ponce-CERN
Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn

Structuring data for efficient l/O 4 format compress addr state c/e Data structure by example-access Find out overheated devices at a given time o find the offset of that time in the file ●read10 Knumbers o apply simple filter read seek Cost 。one seek o one read of 10K ints This is efficient row/cal 7/42 S.Ponce-CERN
Structuring data for efficient I/O 7 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - access Find out overheated devices at a given time find the offset of that time in the file read 10K numbers apply simple filter seek read Cost one seek one read of 10K ints This is efficient !

Structuring data for efficient I/O format compre= Data structure by example access (2 Graph the temperature evolution of a given device o read 43.2K numbers from the file,every 40K bytes ●graph them → 下→ ead "read read see seek seek Cost o43.2K reads of 4 bytes and 43.2K seeks o on top typical block size in a filesystem is 8k you will probably read effectively 20%of the file o actually reading the whole file will be more efficient Here the structure of our data is a killer 8/42 S.Ponce-CERN
Structuring data for efficient I/O 8 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - access (2) Graph the temperature evolution of a given device read 43.2K numbers from the file, every 40K bytes graph them seekread seekread seekread Cost 43.2K reads of 4 bytes and 43.2K seeks ! on top typical block size in a filesystem is 8k you will probably read effectively 20% of the file ! actually reading the whole file will be more efficient Here the structure of our data is a killer

Structuring data for efficient I/O 4 format compress addr state c/c 花5 Column storage Time (mn) Captor 1 Captor 2 Captor c 0 ao bo Zo 1 a1 b1 41 4 。。。 n an bn Zn File content a0a1.an bo b1…bn…z021…Zn Back to efficient read seek read row/cal 9/42 S.Ponce-CERN
Structuring data for efficient I/O 9 / 42 S. Ponce - CERN format compress addr state c/c row/col Column storage Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 a1 ... an b0 b1 ... bn ... z0 z1 ... zn Back to efficient read seek read

Structuring data for efficient I/O 4 format compre addr. Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column o to optimize I/O in general and avoid scattered reads o to optimize data compression o to optimize parallelization of processing Drawback of column storage o a column organized file cannot be updated easily o column storage is usually created from row storage in a postprocessing phase. 10
Structuring data for efficient I/O 10 / 42 S. Ponce - CERN format compress addr state c/c row/col Row vs column storage Definition Row storage respects internal structure of the data and puts the different items one next in a sequence Column storage breaks the internal structure of the data to collate similar pieces Why to use column ? to optimize I/O in general and avoid scattered reads to optimize data compression to optimize parallelization of processing Drawback of column storage a column organized file cannot be updated easily column storage is usually created from row storage in a postprocessing phase
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(booklet).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(pres).pdf
- 中国科学院高能所计算中心:数据技术上机 Data Technologies – CERN School of Computing 2019.pdf
- 中国科学院高能所计算中心:数据技术课程 CSC 2018 Data Technologies Exercises(CSC DT 2018 Introduction).pdf
- 中国科学院高能所计算中心:高能物理数据的存储和管理(汪璐).pptx
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第九章 排序.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第七章 搜索结构.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第五章 树.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第四章 数组、串与广义表.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第三章 栈和队列.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第二章 线性表.ppt
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-booklet.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第1章 绪论(许录平).pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第2章 数字图像处理基础.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第3章 图像变换.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第4章 图像增强.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第5章 图象恢复.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第6章 图像压缩编码.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第7章 图像分割.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第8章 图像描述.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第9章 图像分类识别.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(作业习题)各章要求及必做题参考答案.pdf