中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet

Many ways to store data 4 tfevices distr/c/ Many ways to store data Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018 1/42 S.Ponce-CERN
Many ways to store data 1 / 42 S. Ponce - CERN devices distrib // c/c Many ways to store data S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018

Many ways to store data Overall Course Structure Many ways to Store Data o Storage devices and their specificities Distributing and parallelizing storage Preserving data ●Data consistency Data safety Key ingredients to achieve efficient I/O o Synchronous vs asynchronous I/O I/O optimizations and caching 2/42 S.Ponce-CERN
Many ways to store data 2 / 42 S. Ponce - CERN devices distrib // c/c Overall Course Structure Many ways to Store Data Storage devices and their specificities Distributing and parallelizing storage Preserving data Data consistency Data safety Key ingredients to achieve efficient I/O Synchronous vs asynchronous I/O I/O optimizations and caching

Many ways to store data Outline Storage devices ●Existing devices Hierarchical storage ② Distributed storage ●Data distribution ●Data federation ③ Parallelizing files'storage ●Striping Introduction to Map/Reduce Conclusion 3/42 S.Ponce-CERN
Many ways to store data 3 / 42 S. Ponce - CERN devices distrib // c/c Outline 1 Storage devices Existing devices Hierarchical storage 2 Distributed storage Data distribution Data federation 3 Parallelizing files’ storage Striping Introduction to Map/Reduce 4 Conclusion

Many ways to store data 4 devices distn他/∥c Storage devices ①Storage devices ● Existing devices oHierarchical storage Distributed storage Parallelizing files'storage Conclusion oo HSM 4/42 S.Ponce-CERN
Many ways to store data 4 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Storage devices 1 Storage devices Existing devices Hierarchical storage 2 Distributed storage 3 Parallelizing files’ storage 4 Conclusion

Many ways to store data devices distnb //c/ A variety of storage devices Main differences o Capacities from 1 GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability o Very different speeds Typical numbers in 2018 Capacity Latency $/TB Speed reliability per unit RAM 16GB 5ns 9000$ 10GBs-1 volatile SSD 500GB 10μs 300$ 550MBs-1 poor HD 6TB 3ms 25$ 150MBs-1 average Tape 10TB 100s 20$ 500MBs-1 good too HSM 5/42 S.Ponce-CERN
Many ways to store data 5 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2018 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 5 ns 9000 ✩ 10 GB s−1 volatile SSD 500 GB 10 ➭s 300 ✩ 550 MB s−1 poor HD 6 TB 3 ms 25 ✩ 150 MB s−1 average Tape 10 TB 100 s 20 ✩ 500 MB s−1 good

Many ways to store data devices distnb //cft 花5 A variety of storage devices You cannot have everything cheap HD Tape SSD RAM reliability speed too HSM 6/42 S.Ponce-CERN
Many ways to store data 6 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM A variety of storage devices You cannot have everything cheap reliability speed RAM SSD HD Tape

Many ways to store data devices distnb //c/ Reliability in real world (CERN) For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written too HSM 7/42 S.Ponce-CERN
Many ways to store data 7 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written

Many ways to store data 4 devices distn/∥ch Practical Mass Storage-Real Big Data when you count in 100s of PetaBytes... The constraints disks or tapes are the only possible solutions odisks are unreliable at that scale,and need redundancy we'll see that extensively tapes are cheaper long term storage by factor 2-2.5 tape latency imposes data access on disk 0o HSM 8/42 S.Ponce-CERN
Many ways to store data 8 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Practical Mass Storage - Real Big Data when you count in 100s of PetaBytes... The constraints disks or tapes are the only possible solutions disks are unreliable at that scale, and need redundancy we’ll see that extensively tapes are cheaper long term storage by factor 2-2.5 tape latency imposes data access on disk

Many ways to store data Specificities of tape storage Key points 500MB/s in sequential read/write ●4 k the speed of a disk who said tape is slow o latency/seek time in the order of minutes due to mount time and robot arm moving 。due to positionning o storage is cheap,I/O is not 205/TB for storage capacity 。25 KS for each drive 0o HSM 9/42 S.Ponce-CERN
Many ways to store data 9 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Specificities of tape storage Key points 500MB/s in sequential read/write 4x the speed of a disk who said tape is slow ? latency/seek time in the order of minutes ! due to mount time and robot arm moving due to positionning storage is cheap, I/O is not 20✩/TB for storage capacity 25K✩ for each drive

Many ways to store data 4 devices distr/fct 花5 Tape efficiency Computation 1/0 time efficiency= mount time+l/O time mount size mount time drive speed 1 efficiency= 1+ mount size data size mount size≈50GB 20o HSM 10/42 S.Ponce-CERN
Many ways to store data 10 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Tape efficiency Computation efficiency = I /O time mount time + I /O time mount size = mount time ∗ drive speed efficiency = 1 1 + mount size data size mount size ' 50 GB
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(booklet).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(pres).pdf
- 中国科学院高能所计算中心:数据技术上机 Data Technologies – CERN School of Computing 2019.pdf
- 中国科学院高能所计算中心:数据技术课程 CSC 2018 Data Technologies Exercises(CSC DT 2018 Introduction).pdf
- 中国科学院高能所计算中心:高能物理数据的存储和管理(汪璐).pptx
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第九章 排序.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第七章 搜索结构.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第五章 树.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第四章 数组、串与广义表.ppt
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-booklet.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第1章 绪论(许录平).pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第2章 数字图像处理基础.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第3章 图像变换.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第4章 图像增强.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第5章 图象恢复.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第6章 图像压缩编码.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第7章 图像分割.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第8章 图像描述.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第9章 图像分类识别.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(作业习题)各章要求及必做题参考答案.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)数字图像处理与Matlab.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)上机辅导讲义 - Matlab简介.pdf