香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 3 Algorithms for data streams

clMSc5706TopicsinTheoreticalcomputerScience Week 3:Streaming and Sketching Instructor:Shengyu Zhang 1
Instructor: Shengyu Zhang 1

Map Motivations and model Problem 1:Missing numbers Problem 2:Count-Min sketch Lower bounds 0( Communication complexity 2
Map ◼ Motivations and model ◼ Problem 1: Missing numbers ◼ Problem 2: Count-Min sketch ◼ Lower bounds ❑ Communication complexity 2

Motivations Big mass of data. Data comes as a stream. Cannot see future data Relatively small space."sketch" Cannot store past data Need to process each item fast. Quick update time. Examples:Phone calls,Internet packets, satellite pictures,.. 3
Motivations ◼ Big mass of data. ◼ Data comes as a stream. ❑ Cannot see future data. ◼ Relatively small space. “sketch” ❑ Cannot store past data ◼ Need to process each item fast. ❑ Quick update time. ◼ Examples: Phone calls, Internet packets, satellite pictures, … 3

Problem 1:Missing numbers A set of numbers S={1,2,...,n} n-1 of them come in a stream x1,x2,...,one number is missing. 3,25,6,19,1,10,… Task:identify which one is missing. ▣Using small space
Problem 1: Missing numbers ◼ A set of numbers 𝑆 = {1,2, … , 𝑛} ◼ 𝑛 − 1 of them come in a stream 𝑥1, 𝑥2, …, 𝑥𝑛−1; one number is missing. ◼ Task: identify which one is missing. ❑ Using small space. 4 3, 25, 6, 19,1,10,…

A simple algorithm Maintain the sum of the input numbers. sum =0 ■fori=1ton-1 sumsum xi return n(n+1 2 .-sum 5
A simple algorithm ◼ Maintain the sum of the input numbers. ◼ 𝑠𝑢𝑚 = 0 ◼ for 𝑖 = 1 to 𝑛 − 1 𝑠𝑢𝑚 = 𝑠𝑢𝑚 + 𝑥𝑖 ◼ return 𝑛 𝑛+1 2 − 𝑠𝑢𝑚 5

Space complexity sum is at mostduring the algorithm. 2 Thus it takes at most log(lg bits to write it down. Space complexity:0(1og2 n). Much smaller than storing the whole stream, which takes at least 0(nlogn). 6
Space complexity ◼ 𝑠𝑢𝑚 is at most 𝑛 𝑛+1 2 during the algorithm. ◼ Thus it takes at most log2 𝑛 𝑛+1 2 = 𝑂 log2 𝑛 bits to write it down. ◼ Space complexity:𝑂 log2 𝑛 . ◼ Much smaller than storing the whole stream, which takes at least 𝑂(𝑛 log 𝑛). 6

More complicated Now the task gets harder. n-2 of them come in a stream x1,x2,...xn-2,two numbers are missing. 3,25,6,19,1,10,… Task:identify which two are missing. ▣Using small space
More complicated ◼ Now the task gets harder. ◼ 𝑛 − 2 of them come in a stream 𝑥1, 𝑥2, …, 𝑥𝑛−2, two numbers are missing. ◼ Task: identify which two are missing. ❑ Using small space. 7 3, 25, 6, 19,1,10,…

First try Maintain the sum and product of the input numbers. sum =0;product =1 ■fori=1ton-2 sumsum xi product=product·xi n(n+1-sum,b=n!/product 2 solve equations x+y=a,xy=b ▣return(x,y) 8
First try ◼ Maintain the sum and product of the input numbers. ◼ 𝑠𝑢𝑚 = 0; 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = 1 ◼ for 𝑖 = 1 to 𝑛 − 2 𝑠𝑢𝑚 = 𝑠𝑢𝑚 + 𝑥𝑖 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ⋅ 𝑥𝑖 ◼ 𝑎 = 𝑛 𝑛+1 2 − 𝑠𝑢𝑚, 𝑏 = 𝑛!/𝑝𝑟𝑜𝑑𝑢𝑐𝑡 ◼ solve equations 𝑥 + 𝑦 = 𝑎, 𝑥 ⋅ 𝑦 = 𝑏 ◼ return (𝑥, 𝑦) 8

Problem and solution Issue:product is at least (n-2)! Thus even writing down the number needs log2(n-2)!=Θ(nlog n)bits. Too much compared to 0(logn)before. How to do?
Problem and solution ◼ Issue: 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 is at least 𝑛 − 2 ! ◼ Thus even writing down the number needs log2 𝑛 − 2 ! = Θ(𝑛 log 𝑛) bits. ❑ Too much compared to 𝑂(log𝑛) before. ◼ How to do? 9

Improvement Note that we don't need to maintain product We can maintain anything,as long as finally we can reconstruct the solution from the stored results. One summary that is much smaller than product:sum of squares. "Recall::12+22+…+n2= n(n+1)(2n+1) 6 10
Improvement ◼ Note that we don’t need to maintain product. ◼ We can maintain anything, as long as finally we can reconstruct the solution from the stored results. ◼ One summary that is much smaller than product: sum of squares. ◼ Recall: 1 2 + 2 2 + ⋯ + 𝑛 2 = 𝑛 𝑛+1 2𝑛+1 6 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 2 Linear program. Examples, Simplex algorithms, primal-dual, strong duality(and a physical interpretation), application to games.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 1 Review of basic concepts of algorithms and complexity, probability and tail bounds.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 8:Further Topics on Random Variables 1.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 9:Further Topics on Random Variables 2.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 11:Limit Theorems.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 9:Further Topics on Random Variables 2.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 11:Limit Theorems.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 8:Further Topics on Random Variables 1.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 10:Limit Theorems.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 10:Limit Theorems.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 7:General Random Variables 3.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 7:General Random Variables 3.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 6:General Random Variables 2.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 6:General Random Variables 2.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 5:General Random Variables 1.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 5:General Random Variables 1.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 4:Discrete Random Variables 2.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 4:Discrete Random Variables 2.pdf
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 3:Discrete Random Variables 1.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(辅导材料)Tutorial 3:Discrete Random Variables 1.pdf
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 4 Approximation algorithms.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 5 NP-Complete problems.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 6 Algorithms for resource allocation.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 8 Mechanism design.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 9 Online algorithm.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 10 Online learning. Expert problem.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 11 Influence maximization on social network.pptx
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)Lecture 12 Quantum computing. Quantum algorithms, BQP, quantum non-locality, quantum games.pptx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 1 Basics.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 2 Shor's algorithm.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 3 Hidden Subgroup Problems 1.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 4 Hidden Subgroup Problems 2.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 5 Searching algorithm and quantum adversary method.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 6 Quantum query complexity.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 7 Quantum communication complexity 1.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 8 Quantum communication complexity 2.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 9 Quantum information theory 1.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 10 Quantum information theory 2.docx
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 11 Quantum information theory 3.pdf
- 香港中文大学:《Quantum Computing》课程教学资源(课件讲稿)Lecture 12 Quantum information theory 4.pdf