中国高校课件下载中心 》 教学资源 》 大学文库

中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet

文档信息
资源类别:文库
文档格式:PDF
文档页数:49
文件大小:294.47KB
团购合买:点击进入团购
内容简介
1 Storage devices Existing devices Hierarchical storage 2 Distributed storage Data distribution Data federation 3 Parallelizing files’ storage Striping Introduction to Map/Reduce 4 Conclusion
刷新页面文档预览

Many ways to store data 4 tfevices distr/c/ Many ways to store data Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018 1/42 S.Ponce-CERN

Many ways to store data 1 / 42 S. Ponce - CERN devices distrib // c/c Many ways to store data S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2018

Many ways to store data Overall Course Structure Many ways to Store Data o Storage devices and their specificities Distributing and parallelizing storage Preserving data ●Data consistency Data safety Key ingredients to achieve efficient I/O o Synchronous vs asynchronous I/O I/O optimizations and caching 2/42 S.Ponce-CERN

Many ways to store data 2 / 42 S. Ponce - CERN devices distrib // c/c Overall Course Structure Many ways to Store Data Storage devices and their specificities Distributing and parallelizing storage Preserving data Data consistency Data safety Key ingredients to achieve efficient I/O Synchronous vs asynchronous I/O I/O optimizations and caching

Many ways to store data Outline Storage devices ●Existing devices Hierarchical storage ② Distributed storage ●Data distribution ●Data federation ③ Parallelizing files'storage ●Striping Introduction to Map/Reduce Conclusion 3/42 S.Ponce-CERN

Many ways to store data 3 / 42 S. Ponce - CERN devices distrib // c/c Outline 1 Storage devices Existing devices Hierarchical storage 2 Distributed storage Data distribution Data federation 3 Parallelizing files’ storage Striping Introduction to Map/Reduce 4 Conclusion

Many ways to store data 4 devices distn他/∥c Storage devices ①Storage devices ● Existing devices oHierarchical storage Distributed storage Parallelizing files'storage Conclusion oo HSM 4/42 S.Ponce-CERN

Many ways to store data 4 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Storage devices 1 Storage devices Existing devices Hierarchical storage 2 Distributed storage 3 Parallelizing files’ storage 4 Conclusion

Many ways to store data devices distnb //c/ A variety of storage devices Main differences o Capacities from 1 GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability o Very different speeds Typical numbers in 2018 Capacity Latency $/TB Speed reliability per unit RAM 16GB 5ns 9000$ 10GBs-1 volatile SSD 500GB 10μs 300$ 550MBs-1 poor HD 6TB 3ms 25$ 150MBs-1 average Tape 10TB 100s 20$ 500MBs-1 good too HSM 5/42 S.Ponce-CERN

Many ways to store data 5 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2018 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 5 ns 9000 ✩ 10 GB s−1 volatile SSD 500 GB 10 ➭s 300 ✩ 550 MB s−1 poor HD 6 TB 3 ms 25 ✩ 150 MB s−1 average Tape 10 TB 100 s 20 ✩ 500 MB s−1 good

Many ways to store data devices distnb //cft 花5 A variety of storage devices You cannot have everything cheap HD Tape SSD RAM reliability speed too HSM 6/42 S.Ponce-CERN

Many ways to store data 6 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM A variety of storage devices You cannot have everything cheap reliability speed RAM SSD HD Tape

Many ways to store data devices distnb //c/ Reliability in real world (CERN) For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written too HSM 7/42 S.Ponce-CERN

Many ways to store data 7 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written

Many ways to store data 4 devices distn/∥ch Practical Mass Storage-Real Big Data when you count in 100s of PetaBytes... The constraints disks or tapes are the only possible solutions odisks are unreliable at that scale,and need redundancy we'll see that extensively tapes are cheaper long term storage by factor 2-2.5 tape latency imposes data access on disk 0o HSM 8/42 S.Ponce-CERN

Many ways to store data 8 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Practical Mass Storage - Real Big Data when you count in 100s of PetaBytes... The constraints disks or tapes are the only possible solutions disks are unreliable at that scale, and need redundancy we’ll see that extensively tapes are cheaper long term storage by factor 2-2.5 tape latency imposes data access on disk

Many ways to store data Specificities of tape storage Key points 500MB/s in sequential read/write ●4 k the speed of a disk who said tape is slow o latency/seek time in the order of minutes due to mount time and robot arm moving 。due to positionning o storage is cheap,I/O is not 205/TB for storage capacity 。25 KS for each drive 0o HSM 9/42 S.Ponce-CERN

Many ways to store data 9 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Specificities of tape storage Key points 500MB/s in sequential read/write 4x the speed of a disk who said tape is slow ? latency/seek time in the order of minutes ! due to mount time and robot arm moving due to positionning storage is cheap, I/O is not 20✩/TB for storage capacity 25K✩ for each drive

Many ways to store data 4 devices distr/fct 花5 Tape efficiency Computation 1/0 time efficiency= mount time+l/O time mount size mount time drive speed 1 efficiency= 1+ mount size data size mount size≈50GB 20o HSM 10/42 S.Ponce-CERN

Many ways to store data 10 / 42 S. Ponce - CERN devices distrib // c/c zoo HSM Tape efficiency Computation efficiency = I /O time mount time + I /O time mount size = mount time ∗ drive speed efficiency = 1 1 + mount size data size mount size ' 50 GB

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档