Data Mining Association Analysis——Basic Concepts and Algorithms Chapter 6 Introduction to Data Mining

Data mining Association Analysis: Basic Concepts and algorithms Lecture Notes for Chapter 6 Introduction to Data Mining Tan, steinbach Kumar O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions Example of Association Rules Td tems Bread. milk Diaper}→{Beer MMilk, Bread >Eggs, Coke, Bread, Diaper, Beer, Eggs Beer, Bread}→{Mk Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Implication means co-occurrence, Bread, Milk, Diaper, Coke not causality! n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Diaper} → {Beer}, {Milk, Bread} → {Eggs,Coke}, {Beer, Bread} → {Milk}, Implication means co-occurrence, not causality!

Definition: Frequent Itemset Item set a collection of one or more items Example: Milk, Bread, Diaper] K-itemset TD ltems e An itemset that contains k items Bread, milk Support count(o) Bread, diaper, beer, eggs Frequency of occurrence of an itemset 234 Milk, Diaper, Beer, Coke E.g. o(Milk, Bread, Diaper )=2 Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Support Fraction of transactions that contain an itemset E.g. s(Milk, Bread, Diaper) )=2/5 Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Definition: Frequent Itemset Itemset – A collection of one or more items ◆ Example: {Milk, Bread, Diaper} – k-itemset ◆ An itemset that contains k items Support count () – Frequency of occurrence of an itemset – E.g. ({Milk, Bread,Diaper}) = 2 Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Definition: association rule Association rule TD tems An implication expression of the form X.. where x and y are itemsets Bread. milk Bread, Diaper, Beer, eggs Example: {Mik, Diaper}→{Ber 2345 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Rule evaluation metrics Support (s) e fraction of transactions that contain Example both x and y Mik, Diaper}→Beer Confidence(c) e Measures how often items in y o Milk, Diaper, Beer)2 =0.4 appear in transactions that T contain x C O(Milk, Di Diaper, Beer) 2 0.67 o Milk, Diaper) 3 n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Definition: Association Rule Example: {Milk ,Diaper } Beer 0.4 5 2 | T | (Milk,Diaper,Beer) = = = s 0.67 3 2 (Milk,Diaper) (Milk,Diaper,Beer) = = = c Association Rule – An implication expression of the form X → Y, where X and Y are itemsets – Example: {Milk, Diaper} → {Beer} Rule Evaluation Metrics – Support (s) ◆ Fraction of transactions that contain both X and Y – Confidence (c) ◆ Measures how often items in Y appear in transactions that contain X TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Another Example Customer Customer buys both Customer buys beer Transaction ID Items bought Let minimum support 50%, and 2000 AB c minimum confidence 50%. we 1000AC have 4000 A d A→C(50%,66.6% 5000 B.E.F C→A(50%,100%) n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Another Example Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F Let minimum support 50%, and minimum confidence 50%, we have – A C (50%, 66.6%) – C A (50%, 100%) Customer buys diaper Customer buys both Customer buys beer

Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having support 2 minsup threshold confidence> minconf threshold Brute-force approach List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold – confidence ≥ minconf threshold Brute-force approach: – List all possible association rules – Compute the support and confidence for each rule – Prune rules that fail the minsup and minconf thresholds Computationally prohibitive!

Mining association Rules Example of Rules TID tems Bread. milk MMilk, Diaper>Beer](s=0.4, C=0.67) Bread, Diaper, Beer, eggs MMilk, Beer] >Diaper)(s=0.4, C=1.0) Milk, Diaper, beer, Coke [Diaper, Beer]->Milk(s=0.4, C=0.67) [Beer]->Milk, Diaper](s=0.4, C=0.67) Bread, Milk, Diaper, Beer [Diaper]->Milk, Beer](s=0.4, C=0.5) Bread, Milk, Diaper, Coke MMilk>Diaper, Beer)(s=0.4, C=0.5) Observations All the above rules are binary partitions of the same itemset MIlk, Diaper, Beer] Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Mining Association Rules Example of Rules: {Milk,Diaper} → {Beer} (s=0.4, c=0.67) {Milk,Beer} → {Diaper} (s=0.4, c=1.0) {Diaper,Beer} → {Milk} (s=0.4, c=0.67) {Beer} → {Milk,Diaper} (s=0.4, c=0.67) {Diaper} → {Milk,Beer} (s=0.4, c=0.5) {Milk} → {Diaper,Beer} (s=0.4, c=0.5) TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements

Mining association Rules TWo-step approach 1. Frequent Itemset Generation Generate all itemsets whose support minsup 2. Rule generation Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Mining Association Rules Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive

Frequent Itemset Generation null BD BE ABC)(ABD)(ABE)(ACD)(ACE ADE BCD BCE BDE(CDE ABCD ABCE ABDE ACDE BCDE Given d items. there are 2a possible ABCDE candidate itemsets n Steinbach. Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2d possible candidate itemsets

Frequent Itemset Generation Brute-force approach Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database Transactions List of Candidates TID tems Bread. milk Bread, Diaper, Beer, Eggs 2345 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, beer Bread, Milk, Diaper, Coke W atch each transaction against every candidate Complexity -O(NMw)=> Expensive since M=2d! ! O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the database – Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2d !!! TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions List of Candidates M w
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《信息安全与管理》课程教学资源(PPT课件讲稿)第六章 公开密钥设施PKI.ppt
- 《计算机应用基础》课程教学资源(PPT课件讲稿)第一章 计算机基础知识.ppt
- 《Computer Networking:A Top Down Approach》英文教材教学资源(PPT课件讲稿,3rd edition)Chapter 5 Link Layer.ppt
- 西安电子科技大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第六章 存储器设计.pptx
- 《编译原理》课程教学资源(PPT课件讲稿)第五章 类型检查.ppt
- 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 10 Query expansion.ppt
- 北京师范大学现代远程教育:《计算机应用基础》课程教学资源(PPT课件讲稿)第一章 计算机常识.ppt
- 中国科学技术大学:《网络信息安全 NETWORK SECURITY》课程教学资源(PPT课件讲稿)UNIX/LINUX 操作系统.ppt
- 哈尔滨工业大学:《语言信息处理》课程教学资源(PPT课件讲稿)机器翻译 I Machine Translation I(主讲:张宇).ppt
- 《操作系统 Operating System》课程教学资源(PPT课件讲稿)概述 Overview.ppt
- 《计算机网络》课程教学大纲(计算机科学与技术、网络工程专业).pdf
- 《计算机组装维修》课程PPT教学课件(实训教程)第3章 主板.ppt
- 山西国际商务职业学院:《数据库应用程序设计》课程教学资源(PPT课件)第7章 VFP6程序设计基础.pps
- 《C语言程序设计》课程教学资源(PPT课件讲稿)第8章 指针.ppt
- 山东大学:《微机原理及单片机接口技术》课程教学资源(PPT课件讲稿)第四章 指令系统及汇编语言程序设计(4.6-4.8).ppt
- 《编译原理与技术》课程教学资源(PPT课件讲稿)自底向上分析.ppt
- 西安交通大学:《物联网技术原理》课程教学资源(PPT课件讲稿)第1章 物联网技术概论(主讲:桂小林).ppt
- 贵州师范学院:《高级语言程序设计 Advanced Programming》课程教学资源(PPT课件讲稿)第7章 函数——模块化设计.ppt
- 计算机问题求解(PPT讲稿)分治法与递归.pptx
- 山东大学:《微机原理及单片机接口技术》课程教学资源(PPT课件讲稿)第三章 计算机系统的组成与工作原理(3.1-3.4).ppt
- 《计算机组成原理》课程教学资源(PPT课件讲稿)第五章 存储器层次结构.ppt
- 电子科技大学:《Unix操作系统基础》课程教学资源(PPT课件)第一章 UNIX操作系统概述、第二章 UNIX使用入门.ppt
- 中国水利水电出版社:《单片机原理及应用》课程PPT教学课件(C语言版)第2章 MCS-51单片机基本结构.ppt
- 《数据结构》课程教学资源(PPT课件讲稿)第三章 栈和队列.ppt
- 《网络安全 Network Security》教学资源(PPT讲稿)Topic 3 User Authentication.pptx
- 《C++语言基础教程》课程电子教案(PPT教学课件)教学资源(PPT课件)第2讲 C++语言基础.ppt
- 长春大学:《计算机应用基础》课程教学资源(PPT课件讲稿)第二章 操作系统.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第二章 线性表.ppt
- 浪潮公司:并行程序、编译与函数库简介、应用软件的调优.ppt
- 《C程序设计》课程电子教案(PPT课件讲稿)第二章 基本数据类型及运算.ppt
- 安徽理工大学:《汇编语言》课程教学资源(PPT课件讲稿)第四章 汇编语言程序格式.ppt
- 清华大学:《网络安全 Network Security》课程教学资源(PPT课件讲稿)Lecture 01 Introduction.pptx
- 《数据结构》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 华东理工大学:《Visual Basic程序设计教程》课程教学资源(PPT课件)第四讲 VB语言基础(运算符、函数和表达式).pps
- 《软件工程》课程教学资源(PPT课件讲稿)第4章 软件总体设计.ppt
- 《网络综合布线》课程教学资源(PPT讲稿)模块2 综合布线工程设计.ppt
- 数据库接口技术(PPT讲稿)开放式数据库联接 Open DataBase Connectivity——ODBC.ppt
- 《网络系统集成技术》课程教学资源(PPT课件讲稿)第六章 网络互联技术.ppt
- 清华大学出版社:《网络信息安全技术》教材电子教案(PPT课件讲稿)第2章 密码技术.ppt
- 湖南生物机电职业技术学院:《电子商务概论》课程教学资源(PPT课件)第六章 网上支付.ppt