中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-booklet

Data storage and preservation Data storage and preservation Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2019 1/62 S.Ponce-CERN
Data storage and preservation 1 / 62 S. Ponce - CERN devices // risks consistency safety c/c Data storage and preservation S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2019

Data storage and preservation Outline ①Storage devices Existing devices Parallelizing files'storage o Striping Introduction to Map/Reduce Risks of data loss and corruption ④Data consistency o Checksums Practical usage ⑤Data safety oRedundancy Parity o Erasure coding 6 Conclusion 2/62 S.Ponce-CERN
Data storage and preservation 2 / 62 S. Ponce - CERN devices // risks consistency safety c/c Outline 1 Storage devices Existing devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency Checksums Practical usage 5 Data safety Redundancy Parity Erasure coding 6 Conclusion

Data storage and preservation Storage devices ①Storage devices o Existing devices Parallelizing files'storage Risks of data loss and corruption Data consistency Data safety Conclusion 3/62 S.Ponce-CERN
Data storage and preservation 3 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Storage devices 1 Storage devices Existing devices 2 Parallelizing files’ storage 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion

Data storage and preservation devices A variety of storage devices Main differences o Capacities from 1 GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability o Very different speeds Typical numbers in 2019 Capacity Latency $/TB Speed reliability per unit RAM 16GB 10ns 7000$ 10GBs-1 volatile SSD 500GB 10μs 200$ 1GBs-1 poor HD 6TB 3ms 25$ 150MBs-1 average Tape 20TB 100s 20$ 500MBs-1 good 4/62 S.Ponce-CERN
Data storage and preservation 4 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2019 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 10 ns 7000 ✩ 10 GB s−1 volatile SSD 500 GB 10 ➭s 200 ✩ 1 GB s−1 poor HD 6 TB 3 ms 25 ✩ 150 MB s−1 average Tape 20 TB 100 s 20 ✩ 500 MB s−1 good

Data storage and preservation 花5 A variety of storage devices You cannot have everything cheap HD Tape SSD RAM reliability speed 2o0 5/62 S.Ponce-CERN
Data storage and preservation 5 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices You cannot have everything cheap reliability speed RAM SSD HD Tape

Data storage and preservation devices Risks for my data -Hardware For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written 6/62 S.Ponce-CERN
Data storage and preservation 6 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Risks for my data - Hardware For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written

Data storage and preservation 花5 Parallelizing files'storage Storage devices 2Parallelizing files'storage Striping o Introduction to Map/Reduce 3 Risks of data loss and corruption Data consistency Data safety Conclusion 世nping mapred 7/62 S.Ponce-CERN
Data storage and preservation 7 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Parallelizing files’ storage 1 Storage devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion

Data storage and preservation Why to parallelize storage to work around limitations o individual device speed(think disk) .a file is typically stored on a single device ·network cards'speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream o core network throughput o switches/routers are expensive o machines may have less throughput than their card(s)allow(s) ●hot data congestions o and the black hole it can generate as slower tranfers allow to accumulate more transfers strping mapreduce 8/62 S.Ponce-CERN
Data storage and preservation 8 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Why to parallelize storage ? to work around limitations individual device speed (think disk) a file is typically stored on a single device network cards’ speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream core network throughput switches / routers are expensive machines may have less throughput than their card(s) allow(s) hot data congestions and the black hole it can generate as slower tranfers allow to accumulate more transfers

Data storage and preservation Parallelizing through striping Main idea o use several devices in parallel for a single stream o moving the limitations up by summing performances Basic striping:Divide and conquer for storage o split data into chunks aka stripes on different devices o access in parallel File A.1 File A.2 File A.3 File A.4 File B.1 File B.2 File B.3 File C.1 File C.2 File C.3 File C.4 File C.5 File C.6 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 nping mapreduce 9/62 S.Ponce-CERN
Data storage and preservation 9 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Parallelizing through striping Main idea use several devices in parallel for a single stream moving the limitations up by summing performances Basic striping : Divide and conquer for storage split data into chunks aka stripes on different devices access in parallel Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 File C.4 File C.5 File C.6 File B.2 File B.3 File C.1 File C.2 File C.3 File A.1 File A.2 File A.3 File A.4 File B.1

Data storage and preservation RAID O RAID stands to "Redundant Array of Inexpensive Disks" o set of configurations that employ the techniques of striping, mirroring,or parity to create large reliable data stores from multiple general-purpose computer hard disk drives(Wikipedia) Useful RAID levels RAID 0 striping RAID 1 mirroring RAID 5 parity See Data Safety part RAID 6 double parity Can be implemented in hardware or software striping mapreduce 10/62 S.Ponce-CERN
Data storage and preservation 10 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce RAID 0 RAID stands to “Redundant Array of Inexpensive Disks” set of configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (Wikipedia) Useful RAID levels RAID 0 striping RAID 1 mirroring RAID 5 parity RAID 6 double parity See Data Safety part Can be implemented in hardware or software
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(booklet).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(pres).pdf
- 中国科学院高能所计算中心:数据技术上机 Data Technologies – CERN School of Computing 2019.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第1章 绪论(许录平).pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第2章 数字图像处理基础.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第3章 图像变换.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第4章 图像增强.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第5章 图象恢复.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第6章 图像压缩编码.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第7章 图像分割.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第8章 图像描述.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第9章 图像分类识别.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(作业习题)各章要求及必做题参考答案.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)数字图像处理与Matlab.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)上机辅导讲义 - Matlab简介.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)数字图像处理上机实验题.pdf
- 对外经济贸易大学:《计算机应用基础》课程教学大纲 Fundamentals of Computer Application(打印版).pdf
- 对外经济贸易大学:《计算机应用基础》课程授课教案 Fundamentals of Computer Application(打印版).pdf
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第1章 引言 Introduction(主讲:杨寿保).ppt
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第2章 传统加密技术 Classical Encryption Techniques.ppt
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第3章 分组密码和数据加密标准.ppt
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第4章 有限域.ppt
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第5章 高级数据加密标准AES.ppt