中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-pres

Preserving data 4 risks co ncy safety c/e Preserving data Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018 1/43 S.Ponce-CERN
Preserving data 1 / 43 S. Ponce - CERN risks consistency safety c/c Preserving data S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018

Preserving data 花5 In the previous episodes... o We've found out how to store data efficiently o And how to distribute it o And even how to distribute the computation 2/43 S.Ponce-CERN
Preserving data 2 / 43 S. Ponce - CERN risks consistency safety c/c In the previous episodes... We’ve found out how to store data efficiently And how to distribute it And even how to distribute the computation Today Let’s make sure we do not lose or corrupt our nice data !

Preserving data In the previous episodes... We've found out how to store data efficiently o And how to distribute it o And even how to distribute the computation Today Let's make sure we do not lose or corrupt our nice data 2/43 S.Ponce-CERN
Preserving data 2 / 43 S. Ponce - CERN risks consistency safety c/c In the previous episodes... We’ve found out how to store data efficiently And how to distribute it And even how to distribute the computation Today Let’s make sure we do not lose or corrupt our nice data !

Preserving data risls conaistency safety c/c Outline Risks of data loss and corruption ② Data consistency ●Checksums ●Block checksums ③ Data safety ●Redundancy ●Parity ●Erasure coding Conclusion 3/43 S.Ponce-CERN
Preserving data 3 / 43 S. Ponce - CERN risks consistency safety c/c Outline 1 Risks of data loss and corruption 2 Data consistency Checksums Block checksums 3 Data safety Redundancy Parity Erasure coding 4 Conclusion

Preserving data risks consistency safety c/e Risks of data loss and corruption ① Risks of data loss and corruption 2 Data consistency Data safety Conclusion 4/43 S.Ponce-CERN
Preserving data 4 / 43 S. Ponce - CERN risks consistency safety c/c Risks of data loss and corruption 1 Risks of data loss and corruption 2 Data consistency 3 Data safety 4 Conclusion

Preserving data risks consistency Risks for my data -Hardware some numbers for disks probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day 。and all files are lost o one unrecoverable bit error in 1014 bits read/written for 1GB files,that's one file corrupted per 10K files written some numbers for tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 1GB files,that's one file corrupted per 1G files written 5/43 S.Ponce-CERN
Preserving data 5 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - Hardware some numbers for disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 1GB files, that’s one file corrupted per 10K files written some numbers for tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 1GB files, that’s one file corrupted per 1G files written

Preserving data risks consistency Risks for my data -Software BUGS o in your software oe.g.scheduling twice a transfer,not receiving data on the second run and overwriting the correct file with an empty one o in your dependencies .e.g.the transfer protocol used does not support checksum and data may be corrupted by TCP(checksum is only 16 bit,one corrupted packet in 65536 will go through) o in the OS or common libraries e.g.libc locks not being atomic o in the hardware-that is in the micro code running inside e.g.RAID controllers o in your admin tools o e.g.recycling a tape that was not empty 6/43 S.Ponce-CERN
Preserving data 6 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - Software BUGS ! in your software e.g. scheduling twice a transfer, not receiving data on the second run and overwriting the correct file with an empty one in your dependencies e.g. the transfer protocol used does not support checksum and data may be corrupted by TCP (checksum is only 16 bit, one corrupted packet in 65536 will go through) in the OS or common libraries e.g. libc locks not being atomic in the hardware - that is in the micro code running inside e.g. RAID controllers in your admin tools e.g. recycling a tape that was not empty

Preserving data Risks for my data-Human factor Real life cases that went wrong reinstall (and wipe)old machine p23425a4752 Oh no,I actually meant p42532a8779...bad cut and paste rm-rf/top/data/alltimes /2015/04/crap one space too much and all data are gone.... activate garbage collection on pool XYZ,it's full wasn't it tape backed up no oups.... 7/43 S.Ponce-CERN
Preserving data 7 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - Human factor Real life cases that went wrong reinstall (and wipe) old machine p23425a4752 Oh no, I actually meant p42532a8779... bad cut and paste rm -rf /top/data/alltimes /2015/04/crap one space too much and all data are gone.... activate garbage collection on pool XYZ, it’s full wasn’t it tape backed up ? no ? oups

Preserving data risks consistency safety c/e Risks for my data -conclusion 8/43 S.Ponce-CERN
Preserving data 8 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - conclusion You will lose/corrupt data ! better to be able to know when and what even better if you can repair

Preserving data risks conaistency Risks for my data -conclusion You will lose/corrupt data 8/43 S.Ponce-CERN
Preserving data 8 / 43 S. Ponce - CERN risks consistency safety c/c Risks for my data - conclusion You will lose/corrupt data ! better to be able to know when and what even better if you can repair
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(booklet).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(pres).pdf
- 中国科学院高能所计算中心:数据技术上机 Data Technologies – CERN School of Computing 2019.pdf
- 中国科学院高能所计算中心:数据技术课程 CSC 2018 Data Technologies Exercises(CSC DT 2018 Introduction).pdf
- 中国科学院高能所计算中心:高能物理数据的存储和管理(汪璐).pptx
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第九章 排序.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第七章 搜索结构.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第五章 树.ppt
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-booklet.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第1章 绪论(许录平).pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第2章 数字图像处理基础.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第3章 图像变换.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第4章 图像增强.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第5章 图象恢复.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第6章 图像压缩编码.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第7章 图像分割.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第8章 图像描述.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(授课教案)第9章 图像分类识别.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(作业习题)各章要求及必做题参考答案.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)数字图像处理与Matlab.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)上机辅导讲义 - Matlab简介.pdf
- 西安电子科技大学:《数学图像处理 Digital Image Processing Digital Image Processing》课程教学资源(实验指导)数字图像处理上机实验题.pdf