中国高校课件下载中心 》 教学资源 》 大学文库

同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Decision Tree

文档信息
资源类别:文库
文档格式:PPT
文档页数:43
文件大小:1.13MB
团购合买:点击进入团购
内容简介
同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Decision Tree
刷新页面文档预览

Big data Analysis and mining Decision Tree Qinpei zhao赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall 2021/2/9

2021/2/9 1 Big Data Analysis and Mining Qinpei Zhao 赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall Decision Tree

Illustrating Classification Task Tid Attrib Attrib2 Attrib3 Class Learning algorithm Small Medium120 Induction Yes Medium Yes 220K No Learn 8 85K Model No Medium No Small 90K Yes Training set Model Apply Tid Attrib Attrib2 Attrib3 Class Model 12 Yes Medium 110K Deduction 14 No 15 67K est set

Illustrating Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Learning algorithm Training Set

Classification: Definition a Given a collection of records(training set e Each record contains a set of attributes one of the attributes is the class find a mode for class attribute as a function of the values of other attributes a Goal: previously unseen records should be assigned a class as accurately as possible atest set is used to determine the accuracy of the model Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it

Classification: Definition ◼ Given a collection of records (training set ) ◆ Each record contains a set of attributes, one of the attributes is the class. ◼ Find a model for class attribute as a function of the values of other attributes. ◼ Goal: previously unseen records should be assigned a class as accurately as possible. ◆ A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc

Examples of Classification Task ◼ Predicting tumor cells as benign or malignant ◼ Classifying credit card transactions as legitimate or fraudulent ◼ Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil ◼ Categorizing news stories as finance, weather, entertainment, sports, etc

What is a Decision Tree? u An inductive learning task o Use particular facts to make more generalized conclusions aA predictive model based on a branching series of Boolean tests o These smaller boolean tests are less complex than a one-stage classifier a Let's look at a sample decision tree

◼ An inductive learning task ◆ Use particular facts to make more generalized conclusions ◼ A predictive model based on a branching series of Boolean tests ◆ These smaller Boolean tests are less complex than a one-stage classifier ◼ Let’s look at a sample decision tree… What is a Decision Tree?

Example Tax cheating id Refund Marital Taxable Splitting Attributes Status Income Single 125K No 2No Married 100K No Refund No Yes No Single 70K 4 Yes Married 120K No NO Mast 5No Divorced 95K Yes Single, DiVorced Married nO Married 60K No 7 Yes Divorced220K No TaxIn NO 8No Single 85K Yes 80K nO Married 75K No NO YES 10No Single 90K Training data Model: decision tree

Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Refund MarSt TaxInc NO YES NO NO Yes No Single, Divorced Married 80K Splitting Attributes Training Data Model: Decision Tree Example – Tax cheating

Example-Tax cheating MarT Single Married id Refund marital Taxable ivorced Status Income Cheat NO Refund Yes Single 125K Yes 2No Married 100K No 3No Single70K No NO TaxIng 4 Yes Married 120K No 80K 5No Divorced95K Yes NO YES nO Married 60K 7 Yes Divorced220K No Single 85K 9No Married 75K No There could be more than one tree that 10No Single 90K Yes fits the same data!

Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 MarSt Refund TaxInc NO YES NO NO Yes No Married Single, Divorced 80K There could be more than one tree that fits the same data! Example – Tax cheating

Decision Tree Classification Task Tree Tid Attrib 1 Attrib Attrib3 Class Induction algorithm Induction Learn Model Medium 75K Yes Training Set Mode Apply Decision Model ree Tid Attrib1 Attrib2 Attrib3 Class Deduction 14No Small 67K Test Set

Decision Tree Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Tree Induction algorithm Training Set Decision Tree

Apply Model to Test Data Test Data Start from the root of tree Refund marita Taxable Status Income Cheat No Married 80K Refund Yes No NO Mast Single, DWorced Married TaxIne NO <80K NO YES

Apply Model to Test Data Refund MarSt TaxInc NO YES NO NO Yes No Single, Divorced Married 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Start from the root of tree

Apply Model to Test Data Test Data Refund marita Taxable Status Income Cheat No Married 80K Refund Yes No NO Mast Single, DWorced Married TaxIne NO <80K NO YES

Apply Model to Test Data Refund MarSt TaxInc NO YES NO NO Yes No Single, Divorced Married 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档