《Data Warehousing & Data Mining》课程教学资源(PPT讲稿)Ch 2 Discovering Association Rules

COMP 578 Data Warehousing data mining Ch 2 Discovering Association Rules Keith C.C. Chan Department of computing The Hong Kong Polytechnic University
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining

The Ar Mining Problem Given a database of transactions Each transaction being a list of items E.g. purchased by a customer in a visit Find all rules that correlate the presence of one set of items with that of another set of items E. g, 30%of people who buys diapers also uys beer 2
2 The AR Mining Problem ◼ Given a database of transactions. ◼ Each transaction being a list of items. ◼ E.g. purchased by a customer in a visit. ◼ Find all rules that correlate the presence of one set of items with that of another set of items ◼ E.g., 30% of people who buys diapers also buys beer

Motivation applications a If we can find such associations, we will be able to answer 222→beer (What should the company do to boost beer sales?) Diapers→??2 (What other products should the store stocks up?) Attached mailing in direct marketing 3
3 Motivation & Applications (1) ◼ If we can find such associations, we will be able to answer: ◼ ??? beer (What should the company do to boost beer sales?) ◼ Diapers ??? (What other products should the store stocks up?) ◼ Attached mailing in direct marketing

Motivation applications(2) Originally for marketing to understand purchasing trends What products or services customers tend to purchase at the same time or later on? Use market basket analysis to plan Coupon and discounting Do not offer simultaneous discounts on beer and diapers if they tend to be bought together Discount one to pull in sales of the other Product placement a Place products that have a strong purchasing relationship close together Place such products far apart to increase traffic past other Items
4 ◼ Originally for marketing to understand purchasing trends. ◼ What products or services customers tend to purchase at the same time, or later on? ◼ Use market basket analysis to plan: ◼ Coupon and discounting: ◼ Do not offer simultaneous discounts on beer and diapers if they tend to be bought together. ◼ Discount one to pull in sales of the other. ◼ Product placement. ◼ Place products that have a strong purchasing relationship close together. ◼ Place such products far apart to increase traffic past other items. Motivation & Applications (2)

Measure of Interestingness a For a data mining algorithm to mine for interesting association rules, users have to define a measure of"interestingness a Two popular interestingness measures have been ropose Support and Confidence Lift Ratio(Interest) MineSet from SGI use the terms predictability and prevalence instead of support and confidence
5 Measure of Interestingness ◼ For a data mining algorithm to mine for interesting association rules, users have to define a measure of “interestingness”. ◼ Two popular interestingness measures have been proposed: ◼ Support and Confidence ◼ Lift Ratio (Interest) ◼ MineSet from SGI use the terms predictability and prevalence instead of support and confidence

The Support and Confidence Given rule x&y=>Z Support,S=P(x∪YuZ) where AU B indicates that a transaction contains both X and y (union of item sets X and Y) of tuples containing both a &b/ total of tuples Confidence, C=P(ZXUY) P(Z XU Y) is a conditional probability that a transaction having iXUY also contains of tuples containing both X&y&z /# of tuples containing X&y
6 Given rule X & Y => Z ◼ Support, S = P(X Y Z) where A B indicates that a transaction contains both X and Y (union of item sets X and Y) [# of tuples containing both A & B / total # of tuples] ◼ Confidence, C = P(Z | X Y ) P(Z | X Y ) is a conditional probability that a transaction having {XY} also contains Z [# of tuples containing both X&Y&Z / # of tuples containing X&Y] The Support and Confidence

The Support and Confidence Customer Customer buys both Let minimum support 50%, and buys diaper minimum confidence 50%. find out the s and c of 1.A→C 2.C→A Customer buys beer Transaction ID Items Bought 2000 A, B C Answer. 1000 A C A→C(50%,666% 4000 A D 5000 B, E, F C→A(50%,100%) 7
7 The Support and Confidence Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F Let minimum support 50%, and minimum confidence 50%, find out the S and C of : 1. A C 2. C A Customer buys diaper Customer buys both Customer buys beer Answer: A C (50%, 66.6%) C A (50%, 100%)

How Good is a Predictive model? Response curves How does the response rate of a targeted selection compare to a random selection? 100% Optimal Selection Response Targeted Selection Rate Random Selection Most likely to respond Least likely
8 How Good is a Predictive Model? Response curves - How does the response rate of a targeted selection compare to a random selection?

What is A Lift Ratio? (1) ■ Consider the rule: When people buy diapers they also buy beer 50 percent of the time a It states an explicit percentage (50% of the time) Consider this other rule People who purchase a vcr are three times more likely to also purchase a camcorder The rule used the comparative phrase three times more likely
9 What is A Lift Ratio? (1) ◼ Consider the rule: ◼ When people buy diapers they also buy beer 50 percent of the time. ◼ It states an explicit percentage (50% of the time). ◼ Consider this other rule: ◼ People who purchase a VCR are three times more likely to also purchase a camcorder. ◼ The rule used the comparative phrase “three times more likely”?

What is a Lift ratio?(2) a The probability is compared to the baseline likelihood The baseline likelihood is the probability of the event occurring independently E. g, if people normally buy beer 5% of the time then the first rule could have said 10 times more likely.” The ratio in this kind of comparison is called lift a key goal of an association rule mining exercise is to find rules that have the desired lift 10
10 ◼ The probability is compared to the baseline likelihood. ◼ The baseline likelihood is the probability of the event occurring independently. ◼ E.g., if people normally buy beer 5% of the time, then the first rule could have said “10 times more likely.” ◼ The ratio in this kind of comparison is called lift. ◼ A key goal of an association rule mining exercise is to find rules that have the desired lift. What is A Lift Ratio? (2)
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- Microsoft .NET(PPT课件讲稿)Being Objects and A Glimpse into Coding.pptx
- 《PHP程序设计》教学资源(PPT课件讲稿)项目二 网站用户中心.ppt
- 《信息技术基础》课程教学资源(PPT课件)信息技术基础知识的内容.ppt
- 广西外国语学院:《计算机网络》课程教学资源(PPT课件讲稿)第9章 DHCP协议(任课教师:卢豫开).ppt
- 《机器学习》课程教学资源(PPT课件讲稿)第十二章 计算学习理论 Machine Learning.pptx
- 西安交通大学:《网络与信息安全》课程PPT教学课件(网络入侵与防范)第四章 口令破解与防御技术.ppt
- 上海交通大学:《Multicore Architecture and Parallel Computing》课程教学资源(PPT课件讲稿)Lecture 9 MapReduce.pptx
- 河南中医药大学(河南中医学院):《计算机网络》课程教学资源(PPT课件讲稿)第三章 数据链路层.pptx
- 《多媒体教学软件设计》课程教学资源(PPT课件讲稿)第4章 多媒体教学软件的图文演示设计.ppt
- 四川大学:《计算机操作系统 Operating System Principles》课程教学资源(PPT课件讲稿)第9章 文件管理.ppt
- 南京航空航天大学:《数据结构》课程教学资源(PPT课件讲稿)第十章 排序.ppt
- 西安电子科技大学:《信息系统安全》课程教学资源(PPT课件讲稿)第二章 安全控制原理.ppt
- 《C程序设计》课程电子教案(PPT课件讲稿)第四章 数组和结构.ppt
- 北京航空航天大学:Graph Search & Social Networks.pptx
- 《数字图像处理 Digital Image Processing》课程教学资源(各章要求及必做题参考答案).pdf
- Online Minimum Matching in Real-Time Spatial Data:Experiments and Analysis.pptx
- 中国科学技术大学:《并行算法实践》课程教学资源(PPT课件讲稿)上篇 并行程序设计导论 单元II 并行程序编程指南 第七章 OpenMP编程指南.ppt
- 上海交通大学:《网络安全技术》课程教学资源(PPT课件讲稿)比特币(主讲:刘振).pptx
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第三章 数据链路层.ppt
- 同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Clustering Basics(主讲:赵钦佩).pptx
- 《软件工程》课程教学资源(PPT课件讲稿)需求分析.ppt
- 西安电子科技大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第八章 中断系统与可编程中断控制器8259A.pptx
- 《ARM原理与设计》课程教学资源(PPT课件讲稿)Lecture 04 Cortex M3指令集.pptx
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第一章 概述.ppt
- 上海交通大学:《计算机控制技术》课程教学资源(PPT课件)第一章 计算机控制系统概述 Computer Control Technology.ppt
- 3D computer vision techniques v.4b2 1.ppt
- 山东大学:《微机原理及单片机接口技术》课程教学资源(PPT课件讲稿)第六章 中断 §6.1 中断的概念 §6.2 单片机的中断系统及其管理.ppt
- 《人工智能导论》课程教学资源(PPT课件讲稿)群智能(Swarm Intelligence).ppt
- 《计算机网络与互联网 Computer Networks and Internets》课程电子教案(PPT课件讲稿)Part IV 局域网 Local Area Networks(LANs).ppt
- 《计算机网络》课程电子教案(PPT课件讲稿)第2章 数据通信与广域网技术.ppt
- 西安电子科技大学:《信息系统安全》课程教学资源(PPT课件讲稿)第三章 信息安全保障体系、第四章 物理安全.ppt
- 《计算机文化基础》课程教学资源(PPT课件讲稿)第四章 电子表格系统Excel 2003.ppt
- 南京大学:Decidability、Complexity(P、NP、NPC)、Reduce(P NP NPC).pptx
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Chapter 3 Data Transmission.ppt
- 《算法设计》课程教学资源(PPT课件讲稿)Lecture 6 Graph Traversal.ppt
- 《计算机原理及应用》课程教学资源(PPT课件讲稿)第8章 单片机的存储器的扩展.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第6章 IP路由.ppt
- 《计算机仿真技术》课程电子教案(PPT教学课件)第一章 绪论.ppt
- 上海交通大学:《挖掘海量数据集 Mining Massive Datasets》课程教学资源(PPT讲稿)Lecture 07 链接分析 Link Analysis.ppt
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(PPT课件讲稿)Chapter 09 Classical Staistical Inference.pptx