电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 4 Sampling for Big Data

Lecture 4 Sampling for Big Data
Lecture 4 Sampling for Big Data

What's the value of TT? MC Approximation of Pi 3.14616 8 -0.5 0.0 0.5 0 Areacircle πr2 π #Pointcircle π=4* Areasquare (2r)(2r) =4 #Pointsquare
What’s the value of π?

Outline >Motivation /Benefits >Basics of sampling Inverse Transform sampling ·Rejection sampling ·Importance sampling Markov chain Monte Carlo (MCMC) MH Sampling Gibbs Sampling >Stream sampling Sample >Conclusion
¾Motivation / Benefits ¾Basics of sampling • Inverse Transform sampling • Rejection sampling • Importance sampling • Markov chain Monte Carlo (MCMC) MH Sampling Gibbs Sampling ¾Stream sampling ¾Conclusion Outline

Why Sampling? 10 12 Big data issue 。 Store complexity 2 10 Sample Calculate complexity 35 ● Posterior estimation Expectation estimation Population ● Mean age of people in China
Big data issue • Store complexity • Calculate complexity • … Why Sampling? Posterior estimation • Expectation estimation • …… Mean age of people in China

Bad Sampling Perform your research with bad samples,or just ones that are inaccurately designed,and you will almost certainly get misleading results. Examples:only sample teenagers when querying the mean age of people in China YOUR SAMPLING IS BAD AND YOU SHOULD FEEL BAD
Bad Sampling • Perform your research with bad samples, or just ones that are inaccurately designed, and you will almost certainly get misleading results. • Examples: only sample teenagers when querying the mean age of people in China

Inverse Transform Sampling 00 9 Qo 6 0 P 9 9.0 000 o。0 08 88o 00 00 0 oo 00 Gaussian distribution 0.9 Q.8 yi ® 03 02 0.2 0.1 sample xi sample xi 01 sample xi 90 5 0 10 5 (a).Gaussian CDF (b).Gaussian CDF with o2o (c).Gaussian CDF with o20
Gaussian distribution Inverse Transform Sampling

Inverse Transform Sampling >Sampling based on the inverse of Cumulative Distribution Function(CDF) >Method: CDF Sampling: Yi~Uniform(0,1) Xi=CDF-1(Yi) >Drawbacks: Usually,it's hard to get the inverse function
¾Sampling based on the inverse of Cumulative Distribution Function (CDF) ¾Method: ¾Drawbacks: • Usually, it’s hard to get the inverse function Inverse Transform Sampling

Example 1 8x ,if0≤x1 8 0.0 0.2 0.4 0.6 0.8 1.0 ,if0≤u<0.25 V31- ,if0.25≤u≤1 2
Example 1

Example 2 2m2 h(x)= (1-m2)z8, x∈m, '0 ,ifx1 my_invcdf samplepdf m2 H1(w=√1-(1-m2u 0.4 0.50.6 0.7 0.8 0.91.0 1.1 x,N=1000
Example 2

Rejection Sampling Ideas: Accept the samples in the region under the graph of its density function and reject others Rejection Sampling: Proposal distribution Q(x) Xi~Q(X) reject a~Uniform(0,Q(Xi)) Mq(x) if(a≤P(X) accept Accept p(x) else Reject x(i) Acceptance ratio =p(x)/Mg(x)
Ideas: • Accept the samples in the region under the graph of its density function and reject others Proposal distribution Q(x) Rejection Sampling Acceptance ratio = p(x)/Mq(x)
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 3 Hashing.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 2 BasicConcepts(Foundations of Data Mining).pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 1 Intro(主讲:邵俊明).pdf
- 计算机科学与技术(PPT讲稿)Unlock with Your Heart - Heartbeat-based Authentication on Commercial Mobile Phones.pptx
- 计算机科学与技术(参考文献)VECTOR - Velocity Based Temperature-field Monitoring with Distributed Acoustic Devices.pdf
- 计算机科学与技术(参考文献)VSkin - Sensing Touch Gestures on Surfaces of Mobile Devices Using Acoustic Signals.pdf
- 计算机科学与技术(参考文献)RespTracker - Multi-user Room-scale Respiration Tracking with Commercial Acoustic Devices.pdf
- 计算机科学与技术(参考文献)Dynamic Speed Warping - Similarity-Based One-shot Learning for Device-free Gesture Signals.pdf
- 计算机科学与技术(参考文献)SpiderMon - Towards Using Cell Towers as Illuminating Sources for Keystroke Monitoring.pdf
- 计算机科学与技术(参考文献)Unlock with Your Heart:Heartbeat-based Authentication on Commercial Mobile Phones.pdf
- 计算机科学与技术(参考文献)QGesture - Quantifying Gesture Distance and Direction with WiFi Signals.pdf
- 计算机科学与技术(PPT讲稿)QGesture - Quantifying Gesture Distance and Direction with WiFi Signals.pptx
- 计算机科学与技术(参考文献)Gait Recognition Using WiFi Signals.pdf
- 计算机科学与技术(参考文献)Gait Recognition Using WiFi Signals.pdf
- 计算机科学与技术(参考文献)Depth Aware Finger Tapping on Virtual Displays.pdf
- 计算机科学与技术(参考文献)Device-Free Gesture Tracking Using Acoustic Signals.pdf
- 计算机科学与技术(参考文献)Device-Free Gesture Tracking Using Acoustic Signals.pdf
- 计算机科学与技术(参考文献)Depth Aware Finger Tapping on Virtual Display.pdf
- 计算机科学与技术(参考文献)Keystroke Recognition Using WiFi Signals.pdf
- 计算机科学与技术(参考文献)Understanding and Modeling of WiFi Signal Based Human Activity Recognition.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 5 Data Stream Mining.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 6 Graph Mining.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 7 Hadoop-Spark.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Introduction(冯钢).pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 1 Overview - A big Picture on Traffic Control and QoS in IP networks.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 2 Call-level Models and Admission Control.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 3 Traffic Policing and Shaping.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 4 TCP Traffic Control.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 5 Buffer Management.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 6 Packet Scheduling.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 7 IntServ/RSVP and DiffServ.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 8 Traffic Management and Modeling.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 9 Network Traffic Engineering.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 10 Network Coding and Traffic Balancing.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 11 AI Enabled Wireless Access Control and Handoff.pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)华为Atlas人工智能计算解决方案产品彩页.pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)Xshell远程登陆开发板方法(华为atlas800 - 910).pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)MNIST手写体识别实验.pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)MNIST手写数字识别的Atlas 200DK推理应用.pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)ModelArts花卉识别(基于MindSpore的图像识别全流程代码实战).pdf