中国高校课件下载中心 》 教学资源 》 大学文库

香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 3 Algorithms for data streams

文档信息
资源类别:文库
文档格式:PPTX
文档页数:33
文件大小:1.88MB
团购合买:点击进入团购
内容简介
香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 3 Algorithms for data streams
刷新页面文档预览

clMSc5706TopicsinTheoreticalcomputerScience Week 3:Streaming and Sketching Instructor:Shengyu Zhang 1

Instructor: Shengyu Zhang 1

Map Motivations and model Problem 1:Missing numbers Problem 2:Count-Min sketch Lower bounds 0( Communication complexity 2

Map ◼ Motivations and model ◼ Problem 1: Missing numbers ◼ Problem 2: Count-Min sketch ◼ Lower bounds ❑ Communication complexity 2

Motivations Big mass of data. Data comes as a stream. Cannot see future data Relatively small space."sketch" Cannot store past data Need to process each item fast. Quick update time. Examples:Phone calls,Internet packets, satellite pictures,.. 3

Motivations ◼ Big mass of data. ◼ Data comes as a stream. ❑ Cannot see future data. ◼ Relatively small space. “sketch” ❑ Cannot store past data ◼ Need to process each item fast. ❑ Quick update time. ◼ Examples: Phone calls, Internet packets, satellite pictures, … 3

Problem 1:Missing numbers A set of numbers S={1,2,...,n} n-1 of them come in a stream x1,x2,...,one number is missing. 3,25,6,19,1,10,… Task:identify which one is missing. ▣Using small space

Problem 1: Missing numbers ◼ A set of numbers 𝑆 = {1,2, … , 𝑛} ◼ 𝑛 − 1 of them come in a stream 𝑥1, 𝑥2, …, 𝑥𝑛−1; one number is missing. ◼ Task: identify which one is missing. ❑ Using small space. 4 3, 25, 6, 19,1,10,…

A simple algorithm Maintain the sum of the input numbers. sum =0 ■fori=1ton-1 sumsum xi return n(n+1 2 .-sum 5

A simple algorithm ◼ Maintain the sum of the input numbers. ◼ 𝑠𝑢𝑚 = 0 ◼ for 𝑖 = 1 to 𝑛 − 1 𝑠𝑢𝑚 = 𝑠𝑢𝑚 + 𝑥𝑖 ◼ return 𝑛 𝑛+1 2 − 𝑠𝑢𝑚 5

Space complexity sum is at mostduring the algorithm. 2 Thus it takes at most log(lg bits to write it down. Space complexity:0(1og2 n). Much smaller than storing the whole stream, which takes at least 0(nlogn). 6

Space complexity ◼ 𝑠𝑢𝑚 is at most 𝑛 𝑛+1 2 during the algorithm. ◼ Thus it takes at most log2 𝑛 𝑛+1 2 = 𝑂 log2 𝑛 bits to write it down. ◼ Space complexity:𝑂 log2 𝑛 . ◼ Much smaller than storing the whole stream, which takes at least 𝑂(𝑛 log 𝑛). 6

More complicated Now the task gets harder. n-2 of them come in a stream x1,x2,...xn-2,two numbers are missing. 3,25,6,19,1,10,… Task:identify which two are missing. ▣Using small space

More complicated ◼ Now the task gets harder. ◼ 𝑛 − 2 of them come in a stream 𝑥1, 𝑥2, …, 𝑥𝑛−2, two numbers are missing. ◼ Task: identify which two are missing. ❑ Using small space. 7 3, 25, 6, 19,1,10,…

First try Maintain the sum and product of the input numbers. sum =0;product =1 ■fori=1ton-2 sumsum xi product=product·xi n(n+1-sum,b=n!/product 2 solve equations x+y=a,xy=b ▣return(x,y) 8

First try ◼ Maintain the sum and product of the input numbers. ◼ 𝑠𝑢𝑚 = 0; 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = 1 ◼ for 𝑖 = 1 to 𝑛 − 2 𝑠𝑢𝑚 = 𝑠𝑢𝑚 + 𝑥𝑖 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ⋅ 𝑥𝑖 ◼ 𝑎 = 𝑛 𝑛+1 2 − 𝑠𝑢𝑚, 𝑏 = 𝑛!/𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ◼ solve equations 𝑥 + 𝑦 = 𝑎, 𝑥 ⋅ 𝑦 = 𝑏 ◼ return (𝑥, 𝑦) 8

Problem and solution Issue:product is at least (n-2)! Thus even writing down the number needs log2(n-2)!=Θ(nlog n)bits. Too much compared to 0(logn)before. How to do?

Problem and solution ◼ Issue: 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 is at least 𝑛 − 2 ! ◼ Thus even writing down the number needs log2 𝑛 − 2 ! = Θ(𝑛 log 𝑛) bits. ❑ Too much compared to 𝑂(log𝑛) before. ◼ How to do? 9

Improvement Note that we don't need to maintain product We can maintain anything,as long as finally we can reconstruct the solution from the stored results. One summary that is much smaller than product:sum of squares. "Recall::12+22+…+n2= n(n+1)(2n+1) 6 10

Improvement ◼ Note that we don’t need to maintain product. ◼ We can maintain anything, as long as finally we can reconstruct the solution from the stored results. ◼ One summary that is much smaller than product: sum of squares. ◼ Recall: 1 2 + 2 2 + ⋯ + 𝑛 2 = 𝑛 𝑛+1 2𝑛+1 6 10

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档