中国高校课件下载中心 》 教学资源 》 大学文库

Improving MapReduce Performance Using Smart Speculative Execution Strategy

文档信息
资源类别:文库
文档格式:PPSX
文档页数:38
文件大小:795.95KB
团购合买:点击进入团购
内容简介
Improving MapReduce Performance Using Smart Speculative Execution Strategy
刷新页面文档预览

Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers

Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on Computers

Outlines 1 Introduction 02.Background 03. Previous work ◎4. Pitfalls 5. Our Desian ◎6. Evaluation ◎7. Conclusion

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Outlines 1 Introduction 2. Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Introduction The new era of Big Data is coming! oogle- 20 PB per day(2008) YAHOO!-30 TB per day(2009) facebook 60 TB per day(2010) amazon eb y-petabytes per day What does big data mean? Important user information significant business value

Introduction  The new era of Big Data is coming!  – 20 PB per day (2008)  – 30 TB per day (2009)  – 60 TB per day (2010)  –petabytes per day  What does big data mean?  Important user information  significant business value

MapReduce What is mapreduce? most popular parallel computing model proposed by Google Select, Join gre Page rank Inverted index Clustering, machine translation Log analysis database operation Reco Search M engine earnIng Applications Scientific Cryptanalysis computation

MapReduce  What is MapReduce?  most popular parallel computing model proposed by Google database operation Search engine Machine learning Cryptanalysis Scientific computation Applications … Select, Join, Group Page rank, Inverted index, Log analysis Clustering, machine translation, Recommendation

Straggler What is straggler in MapReduce? Nodes on which tasks take an unusually long time to finish It will Delay the job execution time Degrade the cluster throughput How to solve it peculative execution Slow task is backed up on an alternative machine with the hope that the backup one can finish faster

Straggler  What is straggler in MapReduce?  Nodes on which tasks take an unusually long time to finish  It will:  Delay the job execution time  Degrade the cluster throughput  How to solve it?  Speculative execution  Slow task is backed up on an alternative machine with the hope that the backup one can finish faster

Outlines 1 Introduction 02.Background 3. Previous work 4. Pitfalls 5. Our Desian 06. Evaluation 07. Conclusion

0 Outlines 2. Background 3. Previous work 4. Pitfalls 5. Our Design 1. Introduction 6. Evaluation 7. Conclusion

Architecture Master Assign Assign Part Map Part 2 Reduce plit 1 Split 2 art 1 N OutputI Map Part 2 Output2 Split 米 Reduce Output files Input files Part Part 2 Map Stage Reduce stage

Architecture Split 1 Split 2 … Split M Map Part 2 Part 1 Map Part 2 Part 1 Map Part 2 Part 1 Reduce Reduce Output2 Input files Map Stage Reduce Stage Output files Output1 Master … Assign Assign

Programming model a Input: (key, value) pairs o Output: key*, value *) pairs Phase Map Combine ap List(K1, V1) List(K2, v2) List(K2 List(v2)) Stage Copy Sort Reduce Reduce List(K2 Ordered( K2 List(V2)) List(V2) List(K3, V3)

Programming model ❑ Input : (key, value) pairs ❑ Output : (key*, value*) pairs Phase Stage Map: Map Combine List(K1,V1) → List(K2,V2) → List(K2, List(V2)) Reduce: Copy Sort Reduce List(K2, List(V2)) → Ordered (K2, List(V2)) → List(K3,V3)

Causes of Stragglers nternal factors External factors resource capacity of worker resource competition due to nodes is heterogeneous Co-hosted applications resource competition due to v input data skew other MapReduce tasks running on the same worker v remote input or output node source is too slow hardware fault

Causes of Stragglers Internal factors External factors ✓ resource capacity of worker nodes is heterogeneous ✓ resource competition due to other MapReduce tasks running on the same worker node ✓ resource competition due to co-hosted applications ✓ input data skew ✓ remote input or output source is too slow ✓ hardware faulty

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档