电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 2 BasicConcepts(Foundations of Data Mining)

Lecture 2 Foundations of Data Mining
Lecture 2 Foundations of Data Mining

Example 1 Payment prediction Can we predict the salary of one man according to his age,education year and working hours per week? Age Edu.year HoursPerWeek Pay 25 7 40 <50k 38 9 50 ≥50k 28 12 40 ≥50k 24 10 40 <50k 55 4 10 ? Classification
Example 1 Age Edu. year HoursPerWeek Pay 25 7 40 <50k 38 9 50 ≥50k 28 12 40 ≥50k 24 10 40 <50k 55 4 10 ? • Payment prediction – Can we predict the salary of one man according to his age, education year and working hours per week? Classification

Example 2 。Items Clustering Color based Shape based Clustering
Example 2 • Items Clustering Color based Shape based Clustering

What's tasks in ML? -Supervised learning:targets to learn the mapping function or relationship between the features and the labels based on the labeled data.Namely,Y F(X).(e.g.Classification, Prediction) -Unsupervised learning:aims at learning the intrinsic structure from unlabeled data.(e.g.Clustering,Latent Factor Learning and Frequent Items Mining) -Semi-supervised learning:can be regarded as the unsupervised learning with some constraints on labels,or the supervised learning with additional information on the distribution of data. Classification-Clustering-Association Rule Mining-Outlier Detection
What’s tasks in ML? – Supervised learning: targets to learn the mapping function or relationship between the features and the labels based on the labeled data. Namely, 𝑌 = 𝐹(𝑋|𝜃). (e.g. Classification, Prediction) – Unsupervised learning: aims at learning the intrinsic structure from unlabeled data. (e.g. Clustering, Latent Factor Learning and Frequent Items Mining) – Semi-supervised learning: can be regarded as the unsupervised learning with some constraints on labels, or the supervised learning with additional information on the distribution of data. Classification-Clustering- Association Rule Mining- Outlier Detection

Supervised Learning Given training data ={(x1,y1),(x2,y2),..,(XN,yN)}where yi is the corresponding label of data xi,supervised learning learns the mapping function Y F(X|0),or the posterior distribution P(Y X). Dependent variable:PLAY ·Supervised problems Play Don't Play 5 -Classification OUTLOOK Regression sunny overcast rain Learn to Rank Play 2 Play Play 3 Tagging Don't Play 3 Don't Play 0 Don't Play 2 HUMIDITY WINDY 70 TRUE FALSE Play 2 Play 0 Play 0 Play 3 Don't Play 0 Don't Play 3 Don't Play 2 Don't Play 0
Given training data 𝑋 = x1, y1 , x2, y2 , … , xN, yN where 𝑦𝑖 is the corresponding label of data 𝑥𝑖 , supervised learning learns the mapping function 𝑌 = 𝐹(𝑋|𝜃), or the posterior distribution 𝑃 𝑌 𝑋 . • Supervised problems – Classification – Regression – Learn to Rank – Tagging – …… Supervised Learning

Example:Payment Prediction Revisit o Find the mapping function or model to answer whether one's salary is more than 50k. Age Edu.year HoursPerWeek Pay 25 7 40 <50k 38 9 50 ≥50k 28 12 40 ≥50k 24 10 40 <50k 55 4 10 ?
• Find the mapping function or model to answer whether one’s salary is more than 50k. Example: Payment Prediction Revisit Age Edu. year HoursPerWeek Pay 25 7 40 <50k 38 9 50 ≥50k 28 12 40 ≥50k 24 10 40 <50k 55 4 10 ?

If the solid points represent"salary 50k"and hollow ones for 50k",we can use the line to separate those points Which one is better? Blue one -Why?>minimum error on predicted result(separate results) -A good model should minimize the loss on training data
If the solid points represent “salary < 50k” and hollow ones for “≥50k”, we can use the line to separate those points Which one is better? – Blue one – Why? minimum error on predicted result (separate results) – A good model should minimize the loss on training data

LOSS FUNCTION To measure the predicted results,we introduce the loss function L(Y,F(X)),which a non-negative function -0-1 loss 6,w)- y=F(x|8) y+F(x|0) Squared loss L(y,F(x)=(y-F(x8)1 Absolute loss L(y,F(x10))=ly-F(x10)I Log loss L(y,P(ylx,0))=-logP(ylx,0)
To measure the predicted results, we introduce the loss function 𝐿 𝑌, 𝐹 𝑋|𝜃 , which a non-negative function – 0-1 loss – Squared loss – Absolute loss – Log loss LOSS FUNCTION L y, F x = 0, 𝑦 = 𝐹(𝑥|𝜃) 1, 𝑦 ≠ 𝐹(𝑥|𝜃) L y, F x|𝜃 = y − F x|𝜃 2 L y, F x|𝜃 = 𝑦 − 𝐹(𝑥|𝜃) L y, P y|x, 𝜃 = −logP(y|x, 𝜃

Training Loss and Test Loss Training loss:loss on training data Test loss:loss on test data Performance on training data of three models 8 000 )● Performance on training data and test data Who wins? ●
Performance on training data of three models Performance on training data and test data Training Loss and Test Loss Who wins? Training loss: loss on training data Test loss: loss on test data

Generalization Empirical risk: R回-∑6ox》 Note:A good model cannot only take training loss into account and minimize the empirical risk.Instead,improve the model generalization Model Model Model True function True function True function Samples Samples ●Samples Model Selection:To avoid Underfitting and Overfitting
Empirical risk: Note: A good model cannot only take training loss into account and minimize the empirical risk. Instead, improve the model generalization. Generalization R F = 1 N 𝑖=1 𝑁 L yi , F xi Model Selection: To avoid Underfitting and Overfitting
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 1 Intro(主讲:邵俊明).pdf
- 计算机科学与技术(PPT讲稿)Unlock with Your Heart - Heartbeat-based Authentication on Commercial Mobile Phones.pptx
- 计算机科学与技术(参考文献)VECTOR - Velocity Based Temperature-field Monitoring with Distributed Acoustic Devices.pdf
- 计算机科学与技术(参考文献)VSkin - Sensing Touch Gestures on Surfaces of Mobile Devices Using Acoustic Signals.pdf
- 计算机科学与技术(参考文献)RespTracker - Multi-user Room-scale Respiration Tracking with Commercial Acoustic Devices.pdf
- 计算机科学与技术(参考文献)Dynamic Speed Warping - Similarity-Based One-shot Learning for Device-free Gesture Signals.pdf
- 计算机科学与技术(参考文献)SpiderMon - Towards Using Cell Towers as Illuminating Sources for Keystroke Monitoring.pdf
- 计算机科学与技术(参考文献)Unlock with Your Heart:Heartbeat-based Authentication on Commercial Mobile Phones.pdf
- 计算机科学与技术(参考文献)QGesture - Quantifying Gesture Distance and Direction with WiFi Signals.pdf
- 计算机科学与技术(PPT讲稿)QGesture - Quantifying Gesture Distance and Direction with WiFi Signals.pptx
- 计算机科学与技术(参考文献)Gait Recognition Using WiFi Signals.pdf
- 计算机科学与技术(参考文献)Gait Recognition Using WiFi Signals.pdf
- 计算机科学与技术(参考文献)Depth Aware Finger Tapping on Virtual Displays.pdf
- 计算机科学与技术(参考文献)Device-Free Gesture Tracking Using Acoustic Signals.pdf
- 计算机科学与技术(参考文献)Device-Free Gesture Tracking Using Acoustic Signals.pdf
- 计算机科学与技术(参考文献)Depth Aware Finger Tapping on Virtual Display.pdf
- 计算机科学与技术(参考文献)Keystroke Recognition Using WiFi Signals.pdf
- 计算机科学与技术(参考文献)Understanding and Modeling of WiFi Signal Based Human Activity Recognition.pdf
- 计算机科学与技术(参考文献)Understanding and Modeling of WiFi Signal Based Human Activity Recognition.pdf
- 计算机科学与技术(参考文献)Femto-Matching:Efficient Traffic Offloading in Heterogeneous Cellular Networks.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 3 Hashing.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 4 Sampling for Big Data.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 5 Data Stream Mining.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 6 Graph Mining.pdf
- 电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 7 Hadoop-Spark.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Introduction(冯钢).pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 1 Overview - A big Picture on Traffic Control and QoS in IP networks.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 2 Call-level Models and Admission Control.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 3 Traffic Policing and Shaping.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 4 TCP Traffic Control.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 5 Buffer Management.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 6 Packet Scheduling.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 7 IntServ/RSVP and DiffServ.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 8 Traffic Management and Modeling.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 9 Network Traffic Engineering.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 10 Network Coding and Traffic Balancing.pdf
- 电子科技大学:《先进计算机网络技术》课程教学资源(课件讲稿)Unit 11 AI Enabled Wireless Access Control and Handoff.pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)华为Atlas人工智能计算解决方案产品彩页.pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)Xshell远程登陆开发板方法(华为atlas800 - 910).pdf
- 《机器学习 Machine Learning》课程教学资源(实践资料)MNIST手写体识别实验.pdf