中国高校课件下载中心 》 教学资源 》 大学文库

上海交通大学:《Multicore Architecture and Parallel Computing》课程教学资源(PPT课件讲稿)Lecture 9 MapReduce

文档信息
资源类别:文库
文档格式:PPTX
文档页数:56
文件大小:2.6MB
团购合买:点击进入团购
内容简介
上海交通大学:《Multicore Architecture and Parallel Computing》课程教学资源(PPT课件讲稿)Lecture 9 MapReduce
刷新页面文档预览

上声定通大字 SHANGHAI JLAO TONG UNIVERSITY CS427 Multicore Architecture and Parallel Computing Lecture 9 MapReduce Prof Li Jiang 201411/19

CS427 Multicore Architecture and Parallel Computing Lecture 9 MapReduce Prof. Li Jiang 2014/11/19 1

O What is MapReduce Origin from Google, [OSDI 04 A Simple programming mode Functional model For large-scale data processing Exploits large set of commodity computers Executes process in distributed manner Offers high availability

What is MapReduce 2 • Origin from Google, [OSDI’04] • A simple programming model • Functional model • For large-scale data processing – Exploits large set of commodity computers – Executes process in distributed manner – Offers high availability

③ Motivation Large-scale data processing Want to use 1000s of Cpus But don t want hassle of managing things Mapreduce provides Automatic parallelization e distribution fault tolerance Monitoring &t status updates

Motivation 3 • Large-Scale Data Processing – Want to use 1000s of CPUs – But don’t want hassle of managing things • MapReduce provides – Automatic parallelization & distribution – Fault tolerance – I/O scheduling – Monitoring & status updates

o)Benefit of MapReduce Map/reduce Programming model from Lisp (and other functional languages) Many problems can be phrased this way easy to distribute across nodes Nice retry/failure semantics

Benefit of MapReduce 4 • Map/Reduce – Programming model from Lisp – (and other functional languages) • Many problems can be phrased this way • Easy to distribute across nodes • Nice retry/failure semantics

G) Distributed Word Count Split data→→ count→ count Very Split datal→→ count count merged big split datal→→ count count +merde count data Split data→→ count→ count」

Distributed Word Count 5 Very big data Split data Split data Split data Split data count count count count count count count count merge merged count

O Distributed Grep Split datal→→grep matches Very sp| t data→grep matches big-sp| t data→grep→ matches→cat→A∥ matches data Split data-+ grep matches

Distributed Grep 6 Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches

②Map+ Reduce Very MAP Partitioning big REDUcE Result Function data Map Reduce Accepts input key / value Accepts intermediate pair ★ ey/value pair -Emits intermediate Emits output key/value Rey value pair

Map+Reduce 7 • Map – Accepts input key/value pair – Emits intermediate key/value pair • Reduce – Accepts intermediate key/value* pair – Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function

②Map+ Reduce map(key val) is run on each item in set emits new-Rey/ new-val pairs reduce(key, vals) is run for each unique key emitted by mapo emits final output

8 • map(key, val) is run on each item in set – emits new-key / new-val pairs • reduce(key, vals) is run for each unique key emitted by map() – emits final output Map+Reduce

G)Square Sum (map f list lista listg'Unary operator ( map square“(1234) -14916 o reduce (14916) 30

Square Sum 9 • (map f list [list2 list3 …]) • (map square ‘(1 2 3 4)) – (1 4 9 16) • (reduce + ‘(1 4 9 16)) – (+ 16 (+ 9 (+ 4 1) ) ) – 30

G)Word Count Input consists of(url, contents) pairs map key=url, val=contents) For each word w in contents,emit(W,“1”) reduce key-word, values=unig- counts Sum all“1” s in values list Emit result "(word, sum

Word Count 10 – Input consists of (url, contents) pairs – map(key=url, val=contents): • For each word w in contents, emit (w, “1”) – reduce(key=word, values=uniq_counts): • Sum all “1”s in values list • Emit result “(word, sum)

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档