中国高校课件下载中心 》 教学资源 》 大学文库

北京航空航天大学:《数据挖掘——概念和技术(Data Mining - Concepts and Techniques)》课程教学资源(PPT课件讲稿)Chapter 05 Mining Frequent Patterns, Association and Correlations

文档信息
资源类别:文库
文档格式:PPT
文档页数:96
文件大小:2.14MB
团购合买:点击进入团购
内容简介
◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary
刷新页面文档预览

Chapter 5: Mining Frequent Patterns Association and correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysIs Constraint-based association mining Summary February 4, 2021 Data Mining: Concepts and Techniques

February 4, 2021 Data Mining: Concepts and Techniques 3 Chapter 5: Mining Frequent Patterns, Association and Correlations ◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary

What Is frequent pattern analysis? Frequent pattern: a pattern(a set of items, subsequences, substructures etc. that occurs frequently in a data set First proposed by agrawal, imielinski, and Swami [ais3] in the context of frequent itemsets and association rule mining Motivation Finding inherent regularities in data What products were often purchased together?-Beer and diapers? What are the subsequent purchases after buying a pc? What kinds of DNa are sensitive to this new drug Can we automatically classify web documents? pplications Basket data analysis cross-marketing catalog design, sale campaign analysis, Web log(click stream)analysis and dNa sequence analysis February 4, 2021 Data Mining: Concepts and Techniques

February 4, 2021 Data Mining: Concepts and Techniques 4 What Is Frequent Pattern Analysis? ◼ Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set ◼ First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining ◼ Motivation: Finding inherent regularities in data ◼ What products were often purchased together?— Beer and diapers?! ◼ What are the subsequent purchases after buying a PC? ◼ What kinds of DNA are sensitive to this new drug? ◼ Can we automatically classify web documents? ◼ Applications ◼ Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis

Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential, structural(e.g,, sub-graph) patterns Pattern analysis in spatiotemporal, multimedia time series and stream data Classification associative classification Cluster analysis: frequent pattern-based clustering Data warehousing iceberg cube and cube -gradient Semantic data compression fascicles Broad applications February 4, 2021 Data Mining: Concepts and Techniques 5

February 4, 2021 Data Mining: Concepts and Techniques 5 Why Is Freq. Pattern Mining Important? ◼ Discloses an intrinsic and important property of data sets ◼ Forms the foundation for many essential data mining tasks ◼ Association, correlation, and causality analysis ◼ Sequential, structural (e.g., sub-graph) patterns ◼ Pattern analysis in spatiotemporal, multimedia, time￾series, and stream data ◼ Classification: associative classification ◼ Cluster analysis: frequent pattern-based clustering ◼ Data warehousing: iceberg cube and cube-gradient ◼ Semantic data compression: fascicles ◼ Broad applications

Basic Concepts: Frequent Patterns and Association rules Transaction-id Items boug Itemset X={X1…,} A.B. D Find all the rules x>with minimum 20 A.C. D support and confidence 30 A,DE pport s, probability that a 40 B,EF transaction contains xu y 50 B, C,,EF confidence, c conditional Customer Customer probability that a transaction DUVS DO buys diaper having X also contains r Let supmin=50%, COl 50% Freg. Pat :A: 3, B: 3, D 4 E: 3, AD: 3y Association rules. Customer A→D(60%,100% buys beer D→A(60%,75%) February 4, 2021 Data Mining: Concepts and Techniques

February 4, 2021 Data Mining: Concepts and Techniques 6 Basic Concepts: Frequent Patterns and Association Rules ◼ Itemset X = {x1 , …, xk} ◼ Find all the rules X → Y with minimum support and confidence ◼ support, s, probability that a transaction contains X  Y ◼ confidence, c, conditional probability that a transaction having X also contains Y Let supmin = 50%, confmin = 50% Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} Association rules: A → D (60%, 100%) D → A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction-id Items bought 10 A, B, D 20 A, C, D 30 A, D, E 40 B, E, F 50 B, C, D, E, F

Closed patterns and max-Patterns A long pattern contains a combinatorial number of sub- patterns e.g,{a…,a1o0 contains(10)+(10)+…+ (100)=20-1=1.27*1030 sub-patterns Solution: Mine closed patterns and max-patterns instead An itemset X is closed if X is freguent and there exists no super-patternY x, with the same support as X (proposed by pasquier, et al. ICDT99) An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern y X(proposed by Bayardo SigMOd98) Closed pattern is a lossless compression of freq. patterns Reducing the of patterns and rules February 4, 2021 Data Mining: Concepts and Techniques 7

February 4, 2021 Data Mining: Concepts and Techniques 7 Closed Patterns and Max-Patterns ◼ A long pattern contains a combinatorial number of sub￾patterns, e.g., {a1 , …, a100} contains (100 1 ) + (100 2 ) + … + (1 1 0 0 0 0 ) = 2 100 – 1 = 1.27*1030 sub-patterns! ◼ Solution: Mine closed patterns and max-patterns instead ◼ An itemset X is closed if X is frequent and there exists no super-pattern Y כ X, with the same support as X (proposed by Pasquier, et al. @ ICDT’99) ◼ An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern Y כ X (proposed by Bayardo @ SIGMOD’98) ◼ Closed pattern is a lossless compression of freq. patterns ◼ Reducing the # of patterns and rules

Closed patterns and max-Patterns ■ Exercise.DB={ 50 a Min_sup=1 What is the set of closed itemset? ■:1 :2 What is the set of max-pattern <a1n…ta100 What is the set of all patterns? February 4, 2021 Data Mining: Concepts and Techniques 8

February 4, 2021 Data Mining: Concepts and Techniques 8 Closed Patterns and Max-Patterns ◼ Exercise. DB = {, } ◼ Min_sup = 1. ◼ What is the set of closed itemset? ◼ : 1 ◼ : 2 ◼ What is the set of max-pattern? ◼ : 1 ◼ What is the set of all patterns? ◼ !!

Chapter 5: Mining Frequent Patterns Association and correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysIs Constraint-based association mining Summary February 4, 2021 Data Mining: Concepts and Techniques

February 4, 2021 Data Mining: Concepts and Techniques 9 Chapter 5: Mining Frequent Patterns, Association and Correlations ◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary

Scalable methods for Mining frequent patterns The downward closure property of frequent patterns a Any subset of a frequent itemset must be frequent If ibeer, diaper, nuts is frequent, so is ibeer diaper i. e, every transaction having beer, diaper, nuts also contains (beer, diaper Scalable mining methods: Three major approaches Apriori (agrawal srikant@VLDB94 Freq. pattern growth(ePgrowth-Han, Pei yin @SIGMOD00) Vertical data format approach(Charm-Zaki Hsiao @SDM02) February 4, 2021 Data Mining: Concepts and Techniques 10

February 4, 2021 Data Mining: Concepts and Techniques 10 Scalable Methods for Mining Frequent Patterns ◼ The downward closure property of frequent patterns ◼ Any subset of a frequent itemset must be frequent ◼ If {beer, diaper, nuts} is frequent, so is {beer, diaper} ◼ i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} ◼ Scalable mining methods: Three major approaches ◼ Apriori (Agrawal & Srikant@VLDB’94) ◼ Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00) ◼ Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)

Apriori: a candidate Generation-and-Test approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated /tested (Agrawal srikant @VLDB 94, Mannila, et al.@ KDD 94) Method Initially, scan DB once to get frequent 1-itemset Generate length( k+1)candidate itemsets from length k frequent itemsets Test the candidates against dB Terminate when no frequent or candidate set can be generated February 4, 2021 Data Mining: Concepts and Techniques 11

February 4, 2021 Data Mining: Concepts and Techniques 11 Apriori: A Candidate Generation-and-Test Approach ◼ Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94) ◼ Method: ◼ Initially, scan DB once to get frequent 1-itemset ◼ Generate length (k+1) candidate itemsets from length k frequent itemsets ◼ Test the candidates against DB ◼ Terminate when no frequent or candidate set can be generated

The apriori algorithm-An Example Supin =2 ItemsetIsup Database TDB Itemset I sup IAN 2 Tid Items B}3 IAS 4C}3 {B} A, C D B, C. E can IC 20 2333 30A, B, C, E 但}3 3 40 B, E 2L Itemset sup 2 Itemset L,「 ItemsetIsup {A,B} {A,C} 匚AC}2 2nd scan A, B {A,C} tB, C 2 B,C}2 {A,E} 1B, Ey 3 {B, {B,C} C, El 2 {B,E} C. Itemset 3rd scan Itemset I sup AB, CE B, C,E] 2 February 4, 2021 Data Mining: Concepts and Techniques 12

February 4, 2021 Data Mining: Concepts and Techniques 12 The Apriori Algorithm—An Example Database TDB 1 st scan C1 L1 L2 C2 C2 2 nd scan C3 3 L3 rd scan Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset {B, C, E} Itemset sup {B, C, E} 2 Supmin = 2

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档