Improving MapReduce Performance Using Smart Speculative Execution Strategy

Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers
Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers

Outlines 1 Introduction 02.Background 03. Previous work ◎4. Pitfalls 5. Our Desian ◎6. Evaluation ◎7. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Outlines 1 Introduction 2. Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Introduction The new era of Big Data is coming! oogle- 20 PB per day(2008) YAHOO!-30 TB per day(2009) facebook 60 TB per day(2010) amazon eb y-petabytes per day What does big data mean? Important user information significant business value
Introduction The new era of Big Data is coming! – 20 PB per day (2008) – 30 TB per day (2009) – 60 TB per day (2010) –petabytes per day What does big data mean? Important user information significant business value

MapReduce What is mapreduce? most popular parallel computing model proposed by Google Select, Join gre Page rank Inverted index Clustering, machine translation Log analysis database operation Reco Search M engine earnIng Applications Scientific Cryptanalysis computation
MapReduce What is MapReduce? most popular parallel computing model proposed by Google database operation Search engine Machine learning Cryptanalysis Scientific computation Applications … Select, Join, Group Page rank, Inverted index, Log analysis Clustering, machine translation, Recommendation

Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will Delay the job execution time Degrade the cluster throughput How to solve it peculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster
Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will: Delay the job execution time Degrade the cluster throughput How to solve it? Speculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster

Outlines 1 Introduction 02.Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion
0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Architecture Master Assign Assign Part Map Part 2 Reduce plit 1 Split 2 art 1 N OutputI Map Part 2 Output2 Split 米 Reduce Output files Input files Part Part 2 Map Stage Reduce stage
Architecture Split 1 Split 2 … Split M Map Part 2 Part 1 Map Part 2 Part 1 Map Part 2 Part 1 Reduce Reduce Output2 Input files Map Stage Reduce Stage Output files Output1 Master … Assign Assign

Programming model a Input: (key, value) pairs o Output: key*, value *) pairs Phase Map Combine ap List(K1, V1) List(K2, v2) List(K2 List(v2)) Stage Copy Sort Reduce Reduce List(K2 Ordered( K2 List(V2)) List(V2) List(K3, V3)
Programming model ❑ Input : (key, value) pairs ❑ Output : (key*, value*) pairs Phase Stage Map: Map Combine List(K1,V1) → List(K2,V2) → List(K2, List(V2)) Reduce: Copy Sort Reduce List(K2, List(V2)) → Ordered (K2, List(V2)) → List(K3,V3)

Causes of Stragglers nternal factors External factors resource capacity of worker resource competition due to nodes is heterogeneous Co-hosted applications resource competition due to v input data skew other MapReduce tasks running on the same worker v remote input or output node source is too slow hardware fault
Causes of Stragglers Internal factors External factors ✓ resource capacity of worker nodes is heterogeneous ✓ resource competition due to other MapReduce tasks running on the same worker node ✓ resource competition due to co-hosted applications ✓ input data skew ✓ remote input or output source is too slow ✓ hardware faulty
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 华中科技大学:《数字逻辑电路》课程教学资源(PPT课件讲稿)第五章 同步时序逻辑电路.ppt
- 大连职业技术学院:《传感器与检测技术》课程PPT教学课件(第二版)第十一章 数字式传感器.pptx
- 电子科技大学:《现代电子技术实验》课程教学资源(PPT讲稿)移位寄存器及其应用研究.ppt
- 《电工学》课程教学资源(PPT课件讲稿)第9章 触发器和时序逻辑电路.ppt
- 香港中文大学:《数学逻辑 Digital Logic》课程教学资源(PPT讲稿)Part 1 Introduction.pptx
- COMP7880:E-Business Strategies Internal organization of e-business activities.ppt
- 电子科技大学:数据选择和译码显示(实验PPT讲稿).ppt
- 《数字信息处理》课程教学资源(PPT实验)DFT及互相关的初步应用.ppt
- 西安电子科技大学:《电路与模拟电子技术》课程教材PPT课件(第二版)第1章 电路的基本概念与基本定律.ppt
- 滤波器传输函数的零点和极点对滤波特性的影响.ppt
- 大连民族学院:LED辐射强度空间分布及半值角的测量.ppt
- 清华大学:《Geant4》教学资源(PPT讲稿)第二讲 几何、物质、可视化.pptx
- 《数字信号处理》课程教学资源(PPT课件讲稿)数字滤波器设计 filter design techniques.ppt
- 西安电子科技大学电子工程学院:模糊与概率(PPT讲稿).ppt
- 信号与系统(PPT讲稿)Signals and System(Complex Exponentials、Unit Impulse and Unit Step Signal、Singular Functions).ppt
- 《电子工艺》课程PPT教学课件:第6章 印制电路板(印制电路板的制造与检验).ppt
- 电容器(PPT讲稿)电容器的标识、电容器的分类、电容器的检测.ppt
- 《数字信号处理 Digital Signal Processing》课程教学资源(讲义)课程简介.pdf
- 广东海洋大学:《数字信号处理 Digital Signal Processing》课程教学资源(PPT课件讲稿)第三章 z变换及离散系统的频域分析.pps
- 贵州电子信息职业技术学院:《视频监控系统》课程教学资源(PPT讲稿)模块三.ppt
- 无线传感器网络的仿真.ppt
- 电子科技大学:滤波器传输函数的零点和极点对滤波特性的影响(PPT实验讲稿).ppt
- 《现代通信光电子学》课程教学资源(PPT讲稿)Chapter VII 半导体激光器 Semiconductor Lasers.ppt
- 海南大学:《数字电子技术 Digital Electronics Technology》课程教学资源(PPT课件讲稿)第2章 逻辑代数基础.ppt
- 西安电子科技大学:《宽带通信网技术》课程教学资源(PPT课件讲稿)课程简介 Broadband Communication Network Technologies(徐展琦).ppt
- 电子产品装配工艺(PPT讲稿)整机安装.ppt
- 《电路理论基础》课程教学资源(PPT课件讲稿)第4章 非线性直流电路.ppt
- 《彩色电视技术》课程教学资源(PPT讲义)第六章 PAL制彩色解码器(亮度通道实际电路分析、色度通道实际电路分析).ppt
- 西安交通大学:《物联网技术概论》课程教学资源(PPT课件讲稿)第三章 AR物联网感知技术(传感器技术).ppt
- 《数字逻辑与数字系统》课程教学资源(PPT课件讲稿)第七章 可编程逻辑器件PLD.ppt
- 《电视技术》课程教学资源(PPT课件讲稿)第四章 高频调谐器.ppt
- 电子科技大学:555集成定时器的应用(实验PPT).ppt
- 《数字信号处理》课程教学资源(PPT课件讲稿)Chapter 06 IIR数字滤波器的设计 IIR Digital Filter Design.ppt
- 赣南师范大学(赣南师范学院):《模拟电路》课程教学资源(PPT课件讲稿)第九章 功率放大电路(使用教材:童诗白《模拟电子技术基础》第三版).ppt
- 《电工电子学》课程教学资源(教学大纲)Electrical Engineering and Electronics.pdf
- 《程控数字交换技术》课程教学资源(PPT课件讲稿)第6章 呼叫接续与程序控制.ppt
- 《模拟电路》课程电子教案(PPT教学课件)第7章 负反馈技术 7.4 电压放大器(电压取样电压求和放大器).ppt
- 华中科技大学:《数字电子技术基础》课程教学资源(PPT课件讲稿)第二章 逻辑代数基础.ppt
- 《通信原理》课程教学资源(PPT课件讲稿)第二章 信号与噪声.ppt
- 《光纤通信》课程教学资源(PPT课件讲稿)第4章 光端机.ppt