北京航空航天大学:《数据挖掘——概念和技术(Data Mining - Concepts and Techniques)》课程教学资源(PPT课件讲稿)Chapter 05 Mining Frequent Patterns, Association and Correlations

Chapter 5: Mining Frequent Patterns Association and correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysIs Constraint-based association mining Summary February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 3 Chapter 5: Mining Frequent Patterns, Association and Correlations ◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary

What Is frequent pattern analysis? Frequent pattern: a pattern(a set of items, subsequences, substructures etc. that occurs frequently in a data set First proposed by agrawal, imielinski, and Swami [ais3] in the context of frequent itemsets and association rule mining Motivation Finding inherent regularities in data What products were often purchased together?-Beer and diapers? What are the subsequent purchases after buying a pc? What kinds of DNa are sensitive to this new drug Can we automatically classify web documents? pplications Basket data analysis cross-marketing catalog design, sale campaign analysis, Web log(click stream)analysis and dNa sequence analysis February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 4 What Is Frequent Pattern Analysis? ◼ Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set ◼ First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining ◼ Motivation: Finding inherent regularities in data ◼ What products were often purchased together?— Beer and diapers?! ◼ What are the subsequent purchases after buying a PC? ◼ What kinds of DNA are sensitive to this new drug? ◼ Can we automatically classify web documents? ◼ Applications ◼ Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis

Why Is Freq. Pattern Mining Important? Discloses an intrinsic and important property of data sets Forms the foundation for many essential data mining tasks Association, correlation, and causality analysis Sequential, structural(e.g,, sub-graph) patterns Pattern analysis in spatiotemporal, multimedia time series and stream data Classification associative classification Cluster analysis: frequent pattern-based clustering Data warehousing iceberg cube and cube -gradient Semantic data compression fascicles Broad applications February 4, 2021 Data Mining: Concepts and Techniques 5
February 4, 2021 Data Mining: Concepts and Techniques 5 Why Is Freq. Pattern Mining Important? ◼ Discloses an intrinsic and important property of data sets ◼ Forms the foundation for many essential data mining tasks ◼ Association, correlation, and causality analysis ◼ Sequential, structural (e.g., sub-graph) patterns ◼ Pattern analysis in spatiotemporal, multimedia, timeseries, and stream data ◼ Classification: associative classification ◼ Cluster analysis: frequent pattern-based clustering ◼ Data warehousing: iceberg cube and cube-gradient ◼ Semantic data compression: fascicles ◼ Broad applications

Basic Concepts: Frequent Patterns and Association rules Transaction-id Items boug Itemset X={X1…,} A.B. D Find all the rules x>with minimum 20 A.C. D support and confidence 30 A,DE pport s, probability that a 40 B,EF transaction contains xu y 50 B, C,,EF confidence, c conditional Customer Customer probability that a transaction DUVS DO buys diaper having X also contains r Let supmin=50%, COl 50% Freg. Pat :A: 3, B: 3, D 4 E: 3, AD: 3y Association rules. Customer A→D(60%,100% buys beer D→A(60%,75%) February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 6 Basic Concepts: Frequent Patterns and Association Rules ◼ Itemset X = {x1 , …, xk} ◼ Find all the rules X → Y with minimum support and confidence ◼ support, s, probability that a transaction contains X Y ◼ confidence, c, conditional probability that a transaction having X also contains Y Let supmin = 50%, confmin = 50% Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} Association rules: A → D (60%, 100%) D → A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction-id Items bought 10 A, B, D 20 A, C, D 30 A, D, E 40 B, E, F 50 B, C, D, E, F

Closed patterns and max-Patterns A long pattern contains a combinatorial number of sub- patterns e.g,{a…,a1o0 contains(10)+(10)+…+ (100)=20-1=1.27*1030 sub-patterns Solution: Mine closed patterns and max-patterns instead An itemset X is closed if X is freguent and there exists no super-patternY x, with the same support as X (proposed by pasquier, et al. ICDT99) An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern y X(proposed by Bayardo SigMOd98) Closed pattern is a lossless compression of freq. patterns Reducing the of patterns and rules February 4, 2021 Data Mining: Concepts and Techniques 7
February 4, 2021 Data Mining: Concepts and Techniques 7 Closed Patterns and Max-Patterns ◼ A long pattern contains a combinatorial number of subpatterns, e.g., {a1 , …, a100} contains (100 1 ) + (100 2 ) + … + (1 1 0 0 0 0 ) = 2 100 – 1 = 1.27*1030 sub-patterns! ◼ Solution: Mine closed patterns and max-patterns instead ◼ An itemset X is closed if X is frequent and there exists no super-pattern Y כ X, with the same support as X (proposed by Pasquier, et al. @ ICDT’99) ◼ An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern Y כ X (proposed by Bayardo @ SIGMOD’98) ◼ Closed pattern is a lossless compression of freq. patterns ◼ Reducing the # of patterns and rules

Closed patterns and max-Patterns ■ Exercise.DB={ 50 a Min_sup=1 What is the set of closed itemset? ■:1 :2 What is the set of max-pattern <a1n…ta100 What is the set of all patterns? February 4, 2021 Data Mining: Concepts and Techniques 8
February 4, 2021 Data Mining: Concepts and Techniques 8 Closed Patterns and Max-Patterns ◼ Exercise. DB = {, } ◼ Min_sup = 1. ◼ What is the set of closed itemset? ◼ : 1 ◼ : 2 ◼ What is the set of max-pattern? ◼ : 1 ◼ What is the set of all patterns? ◼ !!

Chapter 5: Mining Frequent Patterns Association and correlations Basic concepts and a road map Efficient and scalable frequent itemset mining methods Mining various kinds of association rules From association mining to correlation analysIs Constraint-based association mining Summary February 4, 2021 Data Mining: Concepts and Techniques
February 4, 2021 Data Mining: Concepts and Techniques 9 Chapter 5: Mining Frequent Patterns, Association and Correlations ◼ Basic concepts and a road map ◼ Efficient and scalable frequent itemset mining methods ◼ Mining various kinds of association rules ◼ From association mining to correlation analysis ◼ Constraint-based association mining ◼ Summary

Scalable methods for Mining frequent patterns The downward closure property of frequent patterns a Any subset of a frequent itemset must be frequent If ibeer, diaper, nuts is frequent, so is ibeer diaper i. e, every transaction having beer, diaper, nuts also contains (beer, diaper Scalable mining methods: Three major approaches Apriori (agrawal srikant@VLDB94 Freq. pattern growth(ePgrowth-Han, Pei yin @SIGMOD00) Vertical data format approach(Charm-Zaki Hsiao @SDM02) February 4, 2021 Data Mining: Concepts and Techniques 10
February 4, 2021 Data Mining: Concepts and Techniques 10 Scalable Methods for Mining Frequent Patterns ◼ The downward closure property of frequent patterns ◼ Any subset of a frequent itemset must be frequent ◼ If {beer, diaper, nuts} is frequent, so is {beer, diaper} ◼ i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper} ◼ Scalable mining methods: Three major approaches ◼ Apriori (Agrawal & Srikant@VLDB’94) ◼ Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00) ◼ Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)

Apriori: a candidate Generation-and-Test approach Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated /tested (Agrawal srikant @VLDB 94, Mannila, et al.@ KDD 94) Method Initially, scan DB once to get frequent 1-itemset Generate length( k+1)candidate itemsets from length k frequent itemsets Test the candidates against dB Terminate when no frequent or candidate set can be generated February 4, 2021 Data Mining: Concepts and Techniques 11
February 4, 2021 Data Mining: Concepts and Techniques 11 Apriori: A Candidate Generation-and-Test Approach ◼ Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94) ◼ Method: ◼ Initially, scan DB once to get frequent 1-itemset ◼ Generate length (k+1) candidate itemsets from length k frequent itemsets ◼ Test the candidates against DB ◼ Terminate when no frequent or candidate set can be generated

The apriori algorithm-An Example Supin =2 ItemsetIsup Database TDB Itemset I sup IAN 2 Tid Items B}3 IAS 4C}3 {B} A, C D B, C. E can IC 20 2333 30A, B, C, E 但}3 3 40 B, E 2L Itemset sup 2 Itemset L,「 ItemsetIsup {A,B} {A,C} 匚AC}2 2nd scan A, B {A,C} tB, C 2 B,C}2 {A,E} 1B, Ey 3 {B, {B,C} C, El 2 {B,E} C. Itemset 3rd scan Itemset I sup AB, CE B, C,E] 2 February 4, 2021 Data Mining: Concepts and Techniques 12
February 4, 2021 Data Mining: Concepts and Techniques 12 The Apriori Algorithm—An Example Database TDB 1 st scan C1 L1 L2 C2 C2 2 nd scan C3 3 L3 rd scan Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset {B, C, E} Itemset sup {B, C, E} 2 Supmin = 2
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 计算机算法(PPT讲稿)禁忌搜索算法 Tabu Search.ppt
- 2019年《计算机网络》考试大纲.doc
- 四川大学:《数据结构》课程教学资源(PPT课件讲稿)第五章 树和二叉树 Tree & Binary Tree.ppt
- 佛山科学技术学院:《网络技术基础》课程教学资源(专业技能考试大纲).doc
- 《计算机操作系统》课程教学资源(PPT课件讲稿)第二章 进程描述与控制 Process Concept & Process Control.ppt
- 香港城市大学:PROGRAMMING METHODOLOGY AND SOFTWARE ENGINEERING.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第8章 应用层.ppt
- 并行处理(PPT讲稿)Parallel Processing - Hypercubes and Their Algorithms.ppt
- 《计算机网络》课程电子教案(PPT课件讲稿)第2章 数据通信的基础知识.ppt
- 《Excel高级应用》课程教学资源:课程教学大纲.doc
- 新乡学院:《办公自动化》课程教学资源(教学大纲).pdf
- 《视频制作》课程教学资源:课程教学大纲.doc
- 上海师范大学:《R语言与统计分析》课程教学资源(PPT课件)R语言——介绍(主讲:汤银才).ppt
- 南京大学:移动Agent系统支撑(PPT讲稿)Agent Mobility Software Agent(主讲:余萍).pptx
- 赣南师范大学:《计算机网络原理》课程教学资源(PPT课件讲稿)第四章 数据链路层.ppt
- 上海交通大学:《Multicore Architecture and Parallel Computing》课程教学资源(PPT课件讲稿)Lecture 8 CUDA, cont’d.ppt
- 东南大学:《操作系统概念 Operating System Concepts》课程教学资源(PPT课件讲稿)06 Process synchronization.ppt
- 河南中医药大学:《数据库原理》课程教学资源(PPT课件讲稿)第一章 绪论.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)第4章 存储层次结构设计.ppt
- 西安交通大学:《网络与信息安全》课程PPT教学课件(网络入侵与防范)第一章 网络安全概述(主讲:沈超、刘烃).ppt
- 电子科技大学:《计算机操作系统》课程教学资源(PPT课件讲稿)第二章 进程与调度(Processes and Scheduling).ppt
- 交互式数据语言(PPT讲稿)Basic IDL knowledge.ppt
- 江苏海洋大学(淮海工学院):《Java面向对象程序设计》课程教学资源(PPT课件讲稿)全国二级Java考试的重点难点.pptx
- 长春工业大学:《Javascript 程序设计》课程教学资源(PPT课件讲稿)第8章 网页特效 JavaScript.ppt
- 《计算机组成原理》课程教学资源(PPT课件讲稿)第三章 CPU子系统.ppt
- 南京大学:移动Agent系统支撑(PPT讲稿)Mobile Agent Communication——Software Agent.pptx
- PROGRAMMING METHODOLOGY AND SOFTWARE ENGINEERING.ppt
- 《SQL Server 2000数据库教程》教学资源(PPT课件讲稿)第11章 数据库安全性管理.ppt
- 白城师范学院:《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第五章 数据库完整性.pptx
- 香港城市大学:《计算机图形学》课程教学资源(PPT课件讲稿)图的算法 Graph Algorithms.ppt
- 《The C++ Programming Language》课程教学资源(PPT课件讲稿)Lecture 07 Exception Handling.ppt
- 《C语言程序设计》课程教学资源(PPT课件讲稿)第9章 用户自己建立数据类型.pptx
- 《计算机网络教程》课程PPT教学课件(第三版)第3章 网络体系结构与网络协议.ppt
- 西安交通大学:《物联网技术导论》课程教学资源(PPT课件)第一章 物联网技术概论(主讲:桂小林).ppt
- 电子科技大学:《计算机操作系统》课程教学资源(PPT课件讲稿)第二章 进程与调度 Processes and Scheduling.ppt
- 《Web网站设计与开发》课程教学资源(PPT课件讲稿)第10章 Java Web实用开发技术.ppt
- 可信计算 Trusted Computing(PPT讲稿)TSS - TCG Software Stack.ppt
- 西安电子科技大学:《现代密码学》课程教学资源(PPT课件讲稿)第一章 绪论(主讲:董庆宽).pptx
- 《VB程序设计》课程教学资源(PPT课件讲稿)第二章 VB语言基础.ppt
- 《计算机网络》课程教学大纲 Computer Networks.pdf