中国科学院高能所计算中心:数据技术课程 CSC 2018 Data Technologies Exercises(CSC DT 2018 Introduction)

CSC 2018 Data Technologies Exercises Exercises Link Andreas-Joachim Peters CERN IT-ST CERN CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises CSC 2018 Data Technologies Exercises Andreas-Joachim Peters CERN IT-ST Exercises Link

CERN Exercises Overview 1.IO system What do you know already? IOPS,bandwidth,latency blocksize media and their characteristics cache 1O optimisation strategies 1st hour how to debug Io problems 2.Redundancy Technology 2nd hour Parity for RAID technology 3.Cloud Storage Technology 3rd 4th hour Scalability,Hashing,Indexing,,Deduplication CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Exercises Overview 1. IO system • What do you know already? • IOPS, bandwidth, latency & blocksize • media and their characteristics • cache & IO optimisation strategies • how to debug IO problems 2. Redundancy Technology • Parity for RAID technology 3. Cloud Storage Technology • Scalability, Hashing, Indexing,, Deduplication 1st hour 2nd hour 3rd + 4th hour

CERN lutorial Exercise 1 A common user experience:"My IO intensive application does not run fast enough -why?" Three important questions to answer what performance should we expect of our IO system? how can we measure limitations? how can we inspect the IO of our application? To answer this,we need a basic understanding of the IO system, some measurement and debugging tools CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Tutorial - Exercise 1 • A common user experience: “My IO intensive application does not run fast enough - why?” • Three important questions to answer • what performance should we expect of our IO system? • how can we measure limitations? • how can we inspect the IO of our application? • To answer this, we need a basic understanding of the IO system, some measurement and debugging tools

CERN Interlude before we start Let's see what you already know .. 光S16H片 Please open this anonymous online poll with your phone or laptop .. http://etc.ch/WAb5 We will repeat this poll in the end of the exercises and discuss the correct answers! CERN) CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Interlude before we start … Let’s see what you already know … http://etc.ch/WAb5 Please open this anonymous online poll with your phone or laptop … We will repeat this poll in the end of the exercises and discuss the correct answers!

CERN Linux 10 System in non-virtual machines Local 1O Since we measure here, Measurement Tools User read.write 244 GLIBC strace System-call Interface SCI ere房he meta data Cache implemented Virtual Filesystem Switch VFS (not important for the 已X日1Cs XFS EXT4 FS(x) Here is the data cache Skip imalementod important for the Block Layer CxICSES using vmstat iostat Device Drivers KERNEL CERN) CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Linux IO System Skip caching using direct IO Measurement Tools in non-virtual machines Local IO

Linux Performance Tools CERN Varlous.observablity: Various,static teptop teplife d七at如每4g uge ucalis Operating Systom wa Applications runlat cpudist offcputime intel_gpu_top ext4d山it intel_gpu_time System Libraries latencytop 2 System Call Interface schedtool GPU VFS Sockets powertop Scheduler /pree/stat oun t d- Locat Remote uzbesta电 top htop ps pidstat Virtual A668s Actes Memory CPU CPU Block Device Int Ethernet Clocksource tiptop Device Drivers pert pon Firmware /syn/. DRAM hardirgs 1/0 Bridge 1/O Controller Network Controller Disk Disk Port Port FAN ower Supply -1 s线tic performance tools perf-tools/bce tracing tools CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Local Acces Remote Acces

CERN ○Performance Bandwidth IOPS ·Latency .Blocksize Latency Start 10 End Time Bandwidth IO volume time IOPS IO operations time Latency time IO operation Blocksize payload operation CERN) CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises IO Performance Bandwidth = IO volume / time IOPS = IO operations / time Latency = time / IO operation Blocksize = payload / operation •Bandwidth •IOPS •Latency •Blocksize

CERN lO下ype Categories local storage device Local Remote remote storage device end-user analysis performance baseline often given by network sync async big data analysis forward reading Sequen tial Random seek read video streaming selective data analysis bulk data analysis CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises IO Type Categories Local Remote Sequen tial Random local storage device remote storage device performance baseline often given by network forward reading seek + read end-user analysis big data analysis video streaming selective data analysis bulk data analysis sync async

CERN Storage Media Characteristics Streaming Bandwidth Latency 6E+01 10000MB/s 20'000 1 sec 1E-021E-055E07 000 0.001sec 100 MB/s 250 200 0sec 1 MB/s 0sec Tape Disk SSD Memory Tape Disk SSD Memory Random IOPS Network Latency 300 RTT ms 1'000'000ops/s 2000'000 225 RTT ms 300 100'000 1'0000ps/s 150 RTT ms 100 75 RTT ms 0.5 70 100 1 ops/s 0 RTT ms Tape Disk SSD Memory CERN CERN CERN CERN LAN Our Hostel US Australia CERN Disclaimer: numbers are indicative for enterprise devices not always symmetric for RO,WO,RW CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Storage Media Characteristics Streaming Bandwidth 1 MB/s 100 MB/s 10000 MB/s Tape Disk SSD Memory 20'000 1'000 250 200 Random IOPS 1 ops/s 1'000 ops/s 1'000'000 ops/s Tape Disk SSD Memory 2'000'000 100'000 100 Latency 0 sec 0 sec 0.001 sec 1 sec Tape Disk SSD Memory 1E-02 1E-05 5E-07 6E+01 Network Latency 0 RTT ms 75 RTT ms 150 RTT ms 225 RTT ms 300 RTT ms CERN LAN CERN Our Hostel CERN US CERN Australia 300 100 0.5 70 Disclaimer: • numbers are indicative for enterprise devices • not always symmetric for RO,WO, RW

CERN Useful Linyx Command Realtime,CPU,System Time Measurement time Copy/Block IO tool dd INPUT OUTPUT 0f=<> /dev/zero /dev/null dd Block size bs= Block count count= CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Useful Linux Command Realtime, CPU, System Time Measurement time Copy/Block IO tool dd
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学院高能所计算中心:高能物理数据的存储和管理(汪璐).pptx
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第九章 排序.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第七章 搜索结构.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第五章 树.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第四章 数组、串与广义表.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第三章 栈和队列.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第二章 线性表.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第十章 文件、外部排序与外部搜索.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第一章 绪论.ppt
- 计算机系统结构课程教材:计算机科学丛书《深入理解计算机系统》【兰德尔E.布莱恩特、大卫R.奥哈拉伦】原书第三版(中文版)PDF电子书(共十二章)Computer Systems A Programmer's Perspective.pdf
- 上海交通大学:《高级计算机系统结构》课程教学资源(讲稿).pdf
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第09章 新型计算机病毒.ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第08章 移动智能终端恶意代码.ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第07章 Linux病毒技术.ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第06章 宏病毒.ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第05章 特洛伊木马(Trojan horse).ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第04章 传统计算机病毒.ppt
- 上海交通大学:《恶意代码与计算机病毒(原理、技术和实践)》课程教学资源(PPT课件)第03章 计算机病毒结构及技术分析.ppt
- 中国科学院高能所计算中心:数据技术上机 Data Technologies – CERN School of Computing 2019.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(pres).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Writing Parallel software(booklet).pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Modern programming languages for HEP-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Structuring data for efficient I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Many ways to store data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Optimizing existing large codebase-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Preserving data-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-pres.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Key ingredients to achieve effective I/O-booklet.pdf
- 中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres.pdf