同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Evaluation & other classifiers

Big Data Analysis and Mining Evaluation other classifiers Qinpei Zhao赵钦佩 2015 Fall 2021/2/8

ROC curve Receiver Operating Characteristics(Roc) graphs have long been used in signal detection theory to depict the tradeoff between hit rates and false alarm rates over noisy channel a Recent years have seen an increase in the use of roc graphs in the machine learning community A useful technique for organizing classifiers and visualizing their performance a Especially useful for domains with skewed class distribution and unequal classification error cost

True condition Total population Condition positive Condition negative ∑ Condition positive Positive predictive value(PPv). Predicted condition False positive Precision False discovery rate(FDR) positive 2 True posit posit Test outcome postive condition 2Test outcome positive Predicted condition False negative False omission rate(FOR) Negative predictive value(NPV) (ype II error) True negative 2False negative E True negat ∑ Test outcome negative 2 Test outcome negative True positive rate(IPR) False positive rate(FPR). Positive likelihood ratio (LR+ Sensitivity, Recall FAlse positive curacy (Acc= ∑ Condition negative Diagnostic odds ratio(DOR) 2 True positive+ ETrue negative otal population False negative rate(FNR), True negative rate(TNR), Negative likelihood ratio (LR- ∑ False negative ∑ True negative TNR ∑ Condition negati 2021/2/8

ROC curve ROC curve is a plot of TPR(sensitivity) against FPR(specificity) which depicts relative trade-offs between benefits(true positives) and costs(false positives) number of true positives Sensitivit number of true positives+ number of false negatives number of true negatives Specificity= number of true negatives number of false positives

Exampl e ROC space TP=63FP=2891 P=76‖FP=1288 rfect FN=37TN=72109 0.8 N=24 TN=88112 100 10020(07 100 100200 A Sensitivity 8 o5 better Specificity 0.3 02 c worse 0 0.2 0.4 0.6 0.8 FPR or(1-specificity)


ROC curve a Discrete classifier produces an(FPR, TPR) pair corresponding to a single point in ROC space a Some classifier, such as a naive bayes or a neural network, naturally yield an instance probability or score, a numeric value that represents the degree to which an instance is a member of a class a Such a ranking or scoring classifier can be used with a threshold to produce a discrete classifier a Plotting the Roc point for each possible threshold value results in a curve

ROC curve 1.0 四≥2 0 A A 巴≥=00 0.6 0.4 0.4 0.2 0.2 False positive rate False Positive rate ROC curves show the tradeoff between sensitivity and specificity The closer the curve follows the upper -left border of the Roc space the more accurate the test The closer the curve comes to the 45-degree diagonal of the roc space the less accurate the test a common method is to calculate the area under the roc curve

Evaluating a classifier a How well will the classifier we learned perform a novel data? a We can estimate the performance(e.g, accuracy, sensitivity) of the classifier using a test data set Performance on the training data is not a good indicator of performance on future data Test set: independent instances that have not been used in any way to create the classifier Assumption both training data and test data representative samples of the underlying problem

Holdout Cross-validation method ■ oldout method Given data is randomly partitioned into two independent sets n Training set(e.g, 2/3)for model construction n Test set (e.g, 1/3)for accuracy estimation o Random sampling: a variation of holdout o Repeat holdout k times, accuracy= avg. of the accuracies obtained a Cross-validation(k-fold where ke= 10 is most popular o Randomly partition the data into k mutually exclusive subsets each approximately equal size At i-th iteration, use D as test set and others as training set Leave-one-out k folds where k=# of tuples, for small sized data

Bootstrap Bootstrap Works well with small data sets Samples the given training tuples uniformly with replacement n i.e., each time a tuple is selected, it is equally likely to be selected again and re-added to the training set a Several bootstrap methods and a common one is. 632 bootstrap o Adata set with d tuples is sampled d times, with replacement resulting in a training set of d samples. The data tuples that did not make it into the training set end up forming the test set. about 63.2%of the original data end up in the bootstrap and the remaining 36.8%form the test set(since(1-1/d)d=e1=0.368) o Repeat the sampling procedure k times, overall accuracy of the model: ACC( M) Xi=10.632*Acc(Mi)testset +0.368* Acc Trainset

