重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 7 Classification:Basic Concepts

Chapter 7. Classification: Basic Concepts Classification: Basic Concepts Decision tree induction Bayes Classification Methods Rule-Based classification Model evaluation and selection ■ Summary
1 Chapter 7. Classification: Basic Concepts ◼ Classification: Basic Concepts ◼ Decision Tree Induction ◼ Bayes Classification Methods ◼ Rule-Based Classification ◼ Model Evaluation and Selection ◼ Summary

Classification vs Prediction Classification predicts categorical class labels(discrete or nominal classifies data(constructs a model) based on the training set and the values class labels)in a classifying attribute and uses it in classifying new data Prediction models continuous-valued functions, i. e, predicts unknown or missing values
2 ◼ Classification ◼ predicts categorical class labels (discrete or nominal) ◼ classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data ◼ Prediction ◼ models continuous-valued functions, i.e., predicts unknown or missing values Classification vs. Prediction

Classifications Definition Given a collection of records (training set Each record contains a set of attributes one of the attributes is the class find a model for class attribute as a function of the values of other attributes Goal previously unseen records should be assigned a class as accurately as possible a test set is used to determine the accuracy of the model Usually the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it
3 Classification: Definition ◼ Given a collection of records (training set ) ◼ Each record contains a set of attributes, one of the attributes is the class. ◼ Find a model for class attribute as a function of the values of other attributes. ◼ Goal: previously unseen records should be assigned a class as accurately as possible. ◼ A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it

Supervised vs Unsupervised Learning Supervised learning (classification) Supervision The training data (observations measurements etc are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning(clustering) The class labels of training data is unknown Given a set of measurements observations etc with the aim of establishing the existence of classes or clusters in e data 4
4 Supervised vs. Unsupervised Learning ◼ Supervised learning (classification) ◼ Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations ◼ New data is classified based on the training set ◼ Unsupervised learning (clustering) ◼ The class labels of training data is unknown ◼ Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

Prediction problems: Classification vs. Numeric Prediction Classification predicts categorical class labels(discrete or nominal) classifies data(constructs a model) based on the training set and the values (class labels)in a classifying attribute and uses it in classifying new data ■ Numeric Prediction models continuous-valued functions, i.e. predicts unknown or missing values Typical applications Credit/loan approval Medical diagnosis: if a tumor is cancerous or benign a fraud detection: if a transaction is fraudulent u Web page categorization: which category it is 5
5 ◼ Classification ◼ predicts categorical class labels (discrete or nominal) ◼ classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data ◼ Numeric Prediction ◼ models continuous-valued functions, i.e., predicts unknown or missing values ◼ Typical applications ◼ Credit/loan approval: ◼ Medical diagnosis: if a tumor is cancerous or benign ◼ Fraud detection: if a transaction is fraudulent ◼ Web page categorization: which category it is Prediction Problems: Classification vs. Numeric Prediction

Classification→ATw。- Step Process Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute the set of tuples used for model construction is training set the model is represented as classification rules decision trees or mathematical formulae Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set otherwise overfitting If the accuracy is acceptable, use the model to classify new data Note: If the test set is used to select models, it is called validation (test) set
6 Classification—A Two-Step Process ◼ Model construction: describing a set of predetermined classes ◼ Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute ◼ The set of tuples used for model construction is training set ◼ The model is represented as classification rules, decision trees, or mathematical formulae ◼ Model usage: for classifying future or unknown objects ◼ Estimate accuracy of the model ◼ The known label of test sample is compared with the classified result from the model ◼ Accuracy rate is the percentage of test set samples that are correctly classified by the model ◼ Test set is independent of training set (otherwise overfitting) ◼ If the accuracy is acceptable, use the model to classify new data ◼ Note: If the test set is used to select models, it is called validation (test) set

ILlustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class Learning algorithm tedium 100K No Smal 4 Yes Induction 7 Yes 220K Learn Smal Model Medium 10 No Small 90K Yes Training set Model Appl Tid Attrib Attrib2 Attrib3 Class Model Small 55K 12 Yes Medium 110K Deduction 14|No 15 No 7K ? Test Set 7
7 Illustrating Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set

Process (O): Model Construction Classification algorithms Training Data NAMERANK YEARS TENURED Classifier Mike Assistant Prof no (Model) Mary Assistant Prof es Bⅲ Professor 372763 yes Associate prof yes Dave Assistant Prof IF rank="professor no Anne Associate Prof OR years >6 no ThEN tenured =yes 8
8 Process (1): Model Construction Training Data NAME RANK YEARS TENURED Mike Assistant Prof 3 n o Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 n o Anne Associate Prof 3 n o Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

Process(2): Using the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME RANK YEARS TENURED Tom Assistant Prof no Tenured? Merlis a Associate prof no George rofessor 2757 yes Yes Joseph Assistant Prof yes
9 Process (2): Using the Model in Prediction Classifier Testing Data NAME RANK YEARS TENURED Tom Assistant Prof 2 n o Merlisa Associate Prof 7 n o George Professor 5 yes Joseph Assistant Prof 7 yes Unseen Data (Jeff, Professor, 4) Tenured?

Algorithm for Decision Tree Induction Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and conquer manner At start all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes a Test attributes are selected on the basis of a heuristic or statistical measure(e.g information gain Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left
10 Algorithm for Decision Tree Induction ◼ Basic algorithm (a greedy algorithm) ◼ Tree is constructed in a top-down recursive divide-andconquer manner ◼ At start, all the training examples are at the root ◼ Attributes are categorical (if continuous-valued, they are discretized in advance) ◼ Examples are partitioned recursively based on selected attributes ◼ Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) ◼ Conditions for stopping partitioning ◼ All samples for a given node belong to the same class ◼ There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf ◼ There are no samples left
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 6 Advanced Frequent Pattern Mining.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 5 Mining Frequent Patterns, Association and Correlations:Basic Concepts and Methods.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 4 OLAP - Data Warehousing and On-line Analytical Processing.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 3 Data Preprocessing.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 2 about data - Getting to Know Your Data.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 1 introduction.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第7章 机器人规划.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第6章 机器学习.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第5章 搜索策略.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第4章 智能计算(计算智能).ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第3章 推理技术.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第2章 知识表示.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_绪论、第1章 人工智能概述.ppt
- 重庆师范大学:《人工智能》精品课程PPT教学课件_VR虚拟现实和AR增强现实技术.ppt
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)09 Spark内存计算.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)08 流计算 Stream Computing.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)07 图计算 Graph Computing.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)06 HBase.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)05 HDFS.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)04 MapReduce.pdf
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 8 Cluster Analysis:Basic Concepts and Methods.pptx
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 9 Outlier Analysis.ppt
- 延安大学:《网页制作基础教程》课程教学资源_教学大纲.pdf
- 延安大学:《网页制作基础教程》学术论文_基于AJAX技术的Web模型在网站互动平台的应用研究.pdf
- 延安大学:《网页制作基础教程》学术论文_基于RIA技术的实验演示系统的设计与实现.pdf
- 延安大学:《网页制作基础教程》学术论文_服务器推技术在实验演示系统中的应用.pdf
- 延安大学:《网页制作基础教程》学术论文_用户行为驱动的网页布局自动调整的研究.pdf
- 《网页制作基础教程》参考书籍(PDF):JavaScript 权威指南(第四版).pdf
- 《网页制作基础教程》参考书籍(PDF):Python学习手册(第3版,涵盖Pathon 2.5).pdf
- 《网页制作基础教程》参考书籍:CSS Mastery 精通CSS书籍——高级WEB标准解决方案(人民邮电出版社).pdf
- 延安大学:《网页制作基础教程》课程PPT教学课件_第一章 网页结构(牛永洁).ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第二章 网页头部.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第三章 格式化.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第四章 列表的应用.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第五章 使用图像与多媒体.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第六章 使用超级链接.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第七章 在网页中使用表格.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第八章 在网页中使用框架的使用.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第九章 表单.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第十章 XHTML.ppt