Data Mining Association Analysis——Basic Concepts and Algorithms Chapter 6 Introduction to Data Mining

文档信息

资源类别：文库
文档格式：PPT
文档页数：65
文件大小：1.73MB
团购合买：点击进入团购

内容简介

Data mining Association Analysis: Basic Concepts and algorithms Lecture Notes for Chapter 6 Introduction to Data Mining Tan, steinbach Kumar O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions Example of Association Rules Td tems Bread. milk Diaper}→{Beer MMilk, Bread >Eggs, Coke, Bread, Diaper, Beer, Eggs Beer, Bread}→{Mk Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Implication means co-occurrence, Bread, Milk, Diaper, Coke not causality! n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Diaper} → {Beer}, {Milk, Bread} → {Eggs,Coke}, {Beer, Bread} → {Milk}, Implication means co-occurrence, not causality!

Definition: Frequent Itemset Item set a collection of one or more items Example: Milk, Bread, Diaper] K-itemset TD ltems e An itemset that contains k items Bread, milk Support count(o) Bread, diaper, beer, eggs Frequency of occurrence of an itemset 234 Milk, Diaper, Beer, Coke E.g. o(Milk, Bread, Diaper )=2 Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Support Fraction of transactions that contain an itemset E.g. s(Milk, Bread, Diaper) )=2/5 Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Definition: Frequent Itemset Itemset – A collection of one or more items ◆ Example: {Milk, Bread, Diaper} – k-itemset ◆ An itemset that contains k items Support count () – Frequency of occurrence of an itemset – E.g. ({Milk, Bread,Diaper}) = 2 Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Definition: association rule Association rule TD tems An implication expression of the form X.. where x and y are itemsets Bread. milk Bread, Diaper, Beer, eggs Example: {Mik, Diaper}→{Ber 2345 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, Beer Bread, Milk, Diaper, Coke Rule evaluation metrics Support (s) e fraction of transactions that contain Example both x and y Mik, Diaper}→Beer Confidence(c) e Measures how often items in y o Milk, Diaper, Beer)2 =0.4 appear in transactions that T contain x C O(Milk, Di Diaper, Beer) 2 0.67 o Milk, Diaper) 3 n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Definition: Association Rule Example: {Milk ,Diaper } Beer 0.4 5 2 | T | (Milk,Diaper,Beer) = = =  s 0.67 3 2 (Milk,Diaper) (Milk,Diaper,Beer) = = =   c Association Rule – An implication expression of the form X → Y, where X and Y are itemsets – Example: {Milk, Diaper} → {Beer} Rule Evaluation Metrics – Support (s) ◆ Fraction of transactions that contain both X and Y – Confidence (c) ◆ Measures how often items in Y appear in transactions that contain X TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Another Example Customer Customer buys both Customer buys beer Transaction ID Items bought Let minimum support 50%, and 2000 AB c minimum confidence 50%. we 1000AC have 4000 A d A→C(50%,66.6% 5000 B.E.F C→A(50%,100%) n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Another Example Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F Let minimum support 50%, and minimum confidence 50%, we have – A  C (50%, 66.6%) – C  A (50%, 100%) Customer buys diaper Customer buys both Customer buys beer

Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having support 2 minsup threshold confidence> minconf threshold Brute-force approach List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold – confidence ≥ minconf threshold Brute-force approach: – List all possible association rules – Compute the support and confidence for each rule – Prune rules that fail the minsup and minconf thresholds  Computationally prohibitive!

Mining association Rules Example of Rules TID tems Bread. milk MMilk, Diaper>Beer](s=0.4, C=0.67) Bread, Diaper, Beer, eggs MMilk, Beer] >Diaper)(s=0.4, C=1.0) Milk, Diaper, beer, Coke [Diaper, Beer]->Milk(s=0.4, C=0.67) [Beer]->Milk, Diaper](s=0.4, C=0.67) Bread, Milk, Diaper, Beer [Diaper]->Milk, Beer](s=0.4, C=0.5) Bread, Milk, Diaper, Coke MMilk>Diaper, Beer)(s=0.4, C=0.5) Observations All the above rules are binary partitions of the same itemset MIlk, Diaper, Beer] Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Mining Association Rules Example of Rules: {Milk,Diaper} → {Beer} (s=0.4, c=0.67) {Milk,Beer} → {Diaper} (s=0.4, c=1.0) {Diaper,Beer} → {Milk} (s=0.4, c=0.67) {Beer} → {Milk,Diaper} (s=0.4, c=0.67) {Diaper} → {Milk,Beer} (s=0.4, c=0.5) {Milk} → {Diaper,Beer} (s=0.4, c=0.5) TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements

Mining association Rules TWo-step approach 1. Frequent Itemset Generation Generate all itemsets whose support minsup 2. Rule generation Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Mining Association Rules Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support  minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive

Frequent Itemset Generation null BD BE ABC)(ABD)(ABE)(ACD)(ACE ADE BCD BCE BDE(CDE ABCD ABCE ABDE ACDE BCDE Given d items. there are 2a possible ABCDE candidate itemsets n Steinbach. Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2d possible candidate itemsets

Frequent Itemset Generation Brute-force approach Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database Transactions List of Candidates TID tems Bread. milk Bread, Diaper, Beer, Eggs 2345 Milk, Diaper, Beer, Coke Bread, Milk, Diaper, beer Bread, Milk, Diaper, Coke W atch each transaction against every candidate Complexity -O(NMw)=> Expensive since M=2d! ! O Tan, Steinbach, Kumar Introduction to Data Mining 4/18/2004

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#› Frequent Itemset Generation Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the database – Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2d !!! TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions List of Candidates M w

共65页，可试读20页，点击继续阅读 ↓

刷新页面下载完整文档

VIP每日下载上限内不扣除下载券和下载次数；
按次数下载不扣除下载券；
注册用户24小时内重复下载只扣除一次；
顺序：VIP每日次数-->可用次数-->下载券；

点击下载完整版文档（PPT）