香港理工大学:Discovering Classification Rules

COMP 578 Discovering Classification Rules Keith c.c. chan Department of Computing The Hong Kong Polytechnic University
COMP 578 Discovering Classification Rules Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University

An Example Classification Problem Patient records Recovered Symptoms Treatment Not recovered A B?
2 An Example Classification Problem Patient Records Symptoms & Treatment Recovered Not Recovered A? B?

Classification in Relational DB Patient Symptom Treatment/Recovered Mike Headache TypeAYes Mary Fever typeANo Bill Cough Type B2 No Fever Type C1Yes Dave Doug h Type C1Yes Anne Headache Type B2 Yes Will John, having a headache Class Label and treated with Type CI recover?
3 Classification in Relational DB Patient Symptom TreatmentRecovered Mike Headache Type A Yes Mary Fever Type A No Bill Cough Type B2 No Jim Fever Type C1 Yes Dave Cough Type C1 Yes Anne Headache Type B2 Yes Class Label Will John, having a headache and treated with Type C1, recover?

Discovering of Classification Rules Minins Classification Training Rules Data NAME Symptom Treat. Recover? Mike Headache Type AYes Mary Fever Type ANo Classification Cough Type B2No Rules Jim Fever ype C1 Yes Dave Cough Typ ype C1 YesIF Symptom=Headache Anne Headache Type B2 Yes AND Treatment=Cl Then Recover Yes Based on the classification rule discovered. John will recover
4 Discovering of Classification Rules Training Data NAME Symptom Treat. Recover? Mike Headache Type A Yes Mary Fever Type A No Bill Cough Type B2 No Jim Fever Type C1 Yes Dave Cough Type C1 Yes Anne Headache Type B2 Yes Mining Classification Rules IF Symptom = Headache AND Treatment = C1 THEN Recover = Yes Classification Rules Based on the classification rule discovered, John will recover!!!

The classification problem a Given a database consisting of n records Each record characterized by m attributes Each record pre-classified into p different classes Find A set of classification rules(that constitutes a classification model) that characterizes the different classes so that records not originally in the database can be accurately classified I. e predicting"class labels
5 The Classification Problem Given: – A database consisting of n records. – Each record characterized by m attributes. – Each record pre-classified into p different classes. Find: – A set of classification rules (that constitutes a classification model) that characterizes the different classes – so that records not originally in the database can be accurately classified. – I.e “predicting” class labels

Typical Applications Credit approval Classes can be high risk Low risk? 罐 Target marketing What are the classes? Medical diagnosis Classes can be customers with different diseases w Treatment effectiveness analysIs Classes can be patience with different degrees of recovery
6 Typical Applications Credit approval. – Classes can be High Risk, Low Risk? Target marketing. – What are the classes? Medical diagnosis – Classes can be customers with different diseases. Treatment effectiveness analysis. – Classes can be patience with different degrees of recovery

Techniques for Discoveirng of Classification Rules s The k-Nearest Neighbor Algorithm s The linear discriminant function s The Bayesian Approach s The decision tree approach The Neural Network approach s The genetic algorithm approach
7 Techniques for Discoveirng of Classification Rules The k-Nearest Neighbor Algorithm. The Linear Discriminant Function. The Bayesian Approach. The Decision Tree approach. The Neural Network approach. The Genetic Algorithm approach

Example Using The K-NN Algorithm Salary Age Insurance 15K 28 Bt 31K 39 Buy 41K 53 Buy 10K 45 Buy 14K 55 Bu 25K 27 Not buy 42K 32 Not Buy 18K 38 Not bur 33K 44 Not Buy John earns 24K per month and is 42 years old Will he buy insurance?
8 Example Using The k-NN Algorithm Salary Age Insurance 15K 28 Buy 31K 39 Buy 41K 53 Buy 10K 45 Buy 14K 55 Buy 25K 27 Not Buy 42K 32 Not Buy 18K 38 Not Buy 33K 44 Not Buy John earns 24K per month and is 42 years old. Will he buy insurance?

The k-Nearest Neighbor Algorithm All data records correspond to points in the n Dimensional space Nearest neighbor defined in terms of Euclidean distance s k-nn returns the most common class label among k training examples nearest to xq
9 The k-Nearest Neighbor Algorithm All data records correspond to points in the nDimensional space. Nearest neighbor defined in terms of Euclidean distance. k-NN returns the most common class label among k training examples nearest to xq. . _ + _ xq + _ _ + _ _ +

The K-NN Algorithm(2) N k-nn can be for continuous-valued labels Calculate the mean values of the k nearest neighbors w Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query point x a Advantage X Robust to noisy data by averaging k-nearest neighbors 罐 Disadvantage Distance between neighbors could be dominated by irrelevant attributes 10
10 The k-NN Algorithm (2) k-NN can be for continuous-valued labels. – Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm – Weight the contribution of each of the k neighbors according to their distance to the query point xq Advantage: – Robust to noisy data by averaging k-nearest neighbors Disadvantage: – Distance between neighbors could be dominated by irrelevant attributes. w d x q x i 1 2 ( , )
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《软件质量与测试》课程教学资源(PPT大纲课件,目录版).pptx
- 安徽理工大学:《汇编语言》课程教学资源(PPT课件讲稿)第七章 高级汇编语言技术(主讲:李敬兆).ppt
- 《Vb程序设计教程》课程教学资源(PPT课件讲稿)第三章 VB语言基础.pps
- 吉林大学:《C语言》课程教学资源(PPT课件讲稿)第6章 利用数组处理批量数据.ppt
- 《计算机组成原理》课程教学资源(PPT课件讲稿)第4章 处理器(CPU).ppt
- 北京大学:人工神经网络(PPT课件讲稿)Artificial Neural Networks,ANN.ppt
- 西安电子科技大学:《神经网络与模糊系统》课程教学资源(PPT课件讲稿)Chapter 6 结构和平衡 Architecture and Equilibria.ppt
- 清华大学:A Feature Weighting Method for Robust Speech Recognition(Speech Activities in CST).ppt
- 北京师范大学现代远程教育:《计算机应用基础》课程教学资源(PPT课件讲稿)第2章 计算机网络应用.ppsx
- 《Java网站开发》教学资源(PPT讲稿)第9章 过滤器和监听器技术.ppt
- 长春大学:《计算机应用基础》课程教学资源(PPT课件讲稿)第一章 计算机基础知识(崔天明).ppt
- 合肥工业大学:《网络安全概论》课程教学资源(PPT课件讲稿)第2讲 密码学简介(主讲:苏兆品).ppt
- 《计算机网络与因特网 Computer Networks and Internets》课程教学资源(PPT课件讲稿)Part II 物理层(信号、媒介、数据传输).ppt
- 东南大学:《数据结构》课程教学资源(PPT课件讲稿)第三章 栈与队列.ppt
- 清华大学:An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints.pptx
- 四川大学:《操作系统 Operating System》课程教学资源(PPT课件讲稿)Chapter 5 互斥与同步(Mutual Exclusion and Synchronization)5.4 Monitors 5.5 Message Passing 5.6 Readers/Writers Problem.ppt
- 上海交通大学:《程序设计》课程教学资源(PPT课件讲稿)第6章 过程封装——函数.ppt
- 《3ds Max》教学资源(PPT课件)第4章 基本三维模型的创建.ppt
- 南京大学:复杂系统学习(PPT课件讲稿)佩特里网 Petri Nets.pptx
- 香港科技大学:《软件开发》教学资源(PPT课件讲稿)Functions.ppt
- 北京科技大学:物联网知识体系和学科建设(PPT讲稿,王志良).ppt
- 中国科学技术大学:《信号与图像处理基础 Signal and Image Processing》课程教学资源(PPT课件讲稿)傅里叶分析与卷积 Fourier Analysis and Convolution.pptx
- 沈阳理工大学:《单片机C语言应用程序设计》课程PPT教学课件(单片机C语言编程)04 C51编程设计(廉哲).pptx
- 《软件工程 Software Engineering》教学资源:课程教学大纲.pdf
- 上海交通大学:《编译器构造》课程教学资源(PPT讲稿,马融)Compiler.pptx
- 《数字图象处理》课程教学资源(PPT课件讲稿)第七章 邻域运算.ppt
- 北京航空航天大学:《数据挖掘——概念和技术(Data Mining - Concepts and Techniques)》课程教学资源(PPT课件讲稿)Chapter 03 Data Preprocessing.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第一章 概述(谢希仁).ppt
- 上海交通大学:Mining Massive Datasets(PPT讲稿).ppt
- 东南大学:《数据结构》课程教学资源(PPT课件讲稿)动态规划.pptx
- 《数据结构》课程教学资源:课程教学资源(PPT课件讲稿)第九章 查找表.ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)抽象数据类型 Abstract Data Types.ppt
- 中国科学技术大学:《并行计算 Parallel Computing》课程教学资源(PPT课件讲稿)并行编译简介.ppt
- 《单片机原理及应用》课程教学资源(PPT课件讲稿)第6章 AT89S52单片机的串行口.ppt
- 上海交通大学:《程序设计》课程教学资源(PPT课件讲稿)第4章 循环控制.ppt
- 上海交通大学:《通信网络》课程PPT教学课件(Communication Networks)Introduction(主讲:叶通).pptx
- 北京师范大学:《多媒体技术基础》课程教学资源(PPT课件讲稿)第二章 数字图像(曾兰芳).ppt
- 利用EXCEL进行数据分析与图表处理(PPT讲稿).pptx
- 上海交通大学:《程序设计》课程教学资源(PPT课件讲稿)第9章 模块化开发.ppt
- 《计算科学基础研究》课程教学资源(PPT课件讲稿)类的定义.ppt