北京大学：《大规模数据处理——云计算 Mass Data Processing Cloud Computing》课程教学资源（PPT课件）MapReduce系统设计与实现 Web Search on MapReduce

点击下载完整版文档（PPT）

文档信息

资源类别：文库
文档格式：PPT
文档页数：73
文件大小：3.47MB
团购合买：点击进入团购

内容简介

Last Course Review & Reading Check Document Retrieval &Inverted Index PageRank Algorithm

刷新页面文档预览

Outline Last Course Review Reading Check Document Retrieval &Inverted Index ■PageRank Algorithm

Outline ◼ Last Course Review & Reading Check ◼ Document Retrieval &Inverted Index ◼ PageRank Algorithm

NC&IS Last Course Review

Last Course Review

ki vi k2 v2 ka va ka va ks Vs ke Ve map map map map a 1b 2 c3c6 a 5c 2 b 7 c 8 Shuffle and Sort:aggregate values by keys a 15 b27 c2368 reduce reduce reduce 12 S2 T3 S3

map map map map Shuffle and Sort: aggregate values by keys reduce reduce reduce k1 v1 k2 v2 k3 v3 k4 v4 k5 v5 k6 v6 a 1 b 2 c 3 c 6 a 5 c 2 b 7 c 8 a 1 5 b 2 7 c 2 3 6 8 r1 s1 r2 s2 r3 s3

MapReduce "Runtime" a Handles scheduling Assigns workers to map and reduce tasks Handles "data distribution" Moves processes to data Handles synchronization Gathers,sorts,and shuffles intermediate data Handles errors and faults Detects worker failures and restarts -Everything happens on top of a distributed FS (later)

MapReduce “Runtime” ◼ Handles scheduling ◼ Assigns workers to map and reduce tasks ◼ Handles “data distribution” ◼ Moves processes to data ◼ Handles synchronization ◼ Gathers, sorts, and shuffles intermediate data ◼ Handles errors and faults ◼ Detects worker failures and restarts ◼ Everything happens on top of a distributed FS (later)

Ki vi k2 v2 ka ve ka va ks vs ke ve map map map map 1b2 a c 3 c 6 a 5c 2 b 7 c 8 combine combine combine combine 1b2 a c 9 a 5c 2 partition partition partition partition Shuffle and Sort:aggregate values by keys a15 b27 c2968 reduce reduce reduce S1 2S2 3s3

combine combine combine combine a 1 b 2 c 9 a 5 c 2 b 7 c 8 partition partition partition partition map map map map k1 v1 k2 v2 k3 v3 k4 v4 k5 v5 k6 v6 a 1 b 2 c 3 c 6 a 5 c 2 b 7 c 8 Shuffle and Sort: aggregate values by keys reduce reduce reduce a 1 5 b 2 7 c 2 9 8 r1 s1 r2 s2 r3 s3 c 2 3 6 8

Word Count:Version 1 1:class MAPPER 2: method MAP(docid a,doc d) 3: H-new ASSOCIATIVEARRAY 4: for all term t∈doc d do 5: H{t}←-H{t}+1 Tally counts for entire document 6: for all term t∈Hdo 7: EMIT(term t,count H{t) Are combiners still needed?

Word Count: Version 1 Are combiners still needed?

Computing the Mean:Version 2 1:class MAPPER 2 method MAP(string t,integer r) 3: EMIT(string t,integer r) 1:class COMBINER 2: method CoMBINE(string t,integers [r1,r2,...) 3: sum←-0 4: cmt←-0 5: for all integer r E integers [r1,r2,...]do 6: sum←-sum+T 7: cnt←-cmt+1 8: EMIT(string t,pair (sum,cnt)) Separate sum and count 1:class REDUCER 2: method REDUCE(string t,pairs [(s1,c1),(s2,c2)...]) 3: sum←-0 4: cmt←-0 5: for all pair (s,c)E pairs [(s1,c1),(s2,c2)...]do 6: sum←-sum+s 7: cnt←-cnt+c 8: Tavg←-sum/cnt 9: EMIT(string t,integer ravg) Why doesn't this work?

Computing the Mean: Version 2 Why doesn’t this work?

Another Try:“Stripes” Idea:group together pairs into an associative array (a,b)→1 (a,c→2 (a,d)→5 a→{b:1,c:2,d:5,e:3,f:2} (a,e)3 (a,f)→2 Each mapper takes a sentence: Generate all co-occurring term pairs For each term,emit a{b:countp,c:countc,d:countd... Reducers perform element-wise sum of associative arrays a→{b:1,d:5,e:3} Key:cleverly-constructed data structure a→{b:1,c:2,d:2,f:2} brings together partial results a→{b:2,c:2,d:7,e:3,f:2}

◼ Idea: group together pairs into an associative array ◼ Each mapper takes a sentence: ◼ Generate all co-occurring term pairs ◼ For each term, emit a → { b: countb , c: countc , d: countd … } ◼ Reducers perform element-wise sum of associative arrays Another Try: “Stripes” (a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2 a → { b: 1, c: 2, d: 5, e: 3, f: 2 } a → { b: 1, d: 5, e: 3 } a → { b: 1, c: 2, d: 2, f: 2 } a → { b: 2, c: 2, d: 7, e: 3, f: 2 } +

f(BlA):“Pairs” (a,*)→32 Reducer holds this value in memory (a,b)→3 (a,b)→3/32 (a,b2)→12 (a,b2)→12/32 (a,b3)→7 (a,b3)→7/32 (a,b4)→1 (a,b4)→1/32 For this to work: Must emit extra(a,*for every b in mapper Must make sure all a's get sent to same reducer (use partitioner) Must make sure (a,*comes first(define sort order) Must hold state in reducer across different key-value pairs

f(B|A): “Pairs” ◼ For this to work: ◼ Must emit extra (a, *) for every bn in mapper ◼ Must make sure all a’s get sent to same reducer (use partitioner) ◼ Must make sure (a, *) comes first (define sort order) ◼ Must hold state in reducer across different key-value pairs (a, b1 ) → 3 (a, b2 ) → 12 (a, b3 ) → 7 (a, b4 ) → 1 … (a, *) → 32 (a, b1 ) → 3 / 32 (a, b2 ) → 12 / 32 (a, b3 ) → 7 / 32 (a, b4 ) → 1 / 32 … Reducer holds this value in memory

NC&IS Homework Reading

Homework Reading

共73页，可试读20页，点击继续阅读 ↓

刷新页面下载完整文档

VIP每日下载上限内不扣除下载券和下载次数；
按次数下载不扣除下载券；
注册用户24小时内重复下载只扣除一次；
顺序：VIP每日次数-->可用次数-->下载券；

点击下载完整版文档（PPT）