上海交通大学:《挖掘海量数据集 Mining Massive Datasets》课程教学资源(PPT讲稿)Lecture 03 Frequent Itemsets and Association Rules Mining Massive Datasets

Frequent Itemsets and Association rules Mining Massive Datasets Wu-Jun li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets and Association Rules
Frequent Itemsets and Association Rules 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets and Association Rules Mining Massive Datasets

Frequent Itemsets and Association rules Outline ■ Association rules A-Priori algorithm Large-scale algorithms
Frequent Itemsets and Association Rules 2 Outline ▪ Association rules ▪ A-Priori algorithm ▪ Large-scale algorithms 2

Frequent Itemsets and Association rules Association Rules The market-Basket model A large set of items, e.g things sold in a supermarket A large set of baskets, each of which is a small set of the items, e.g the things one customer buys on one day WD Items Bread. Coke. Milk Beer bread Beer, Coke, Diaper, Milk Beer Bread, Diaper. Milk Coke, Diaper. Milk
Frequent Itemsets and Association Rules 3 The Market-Basket Model ▪ A large set of items, e.g., things sold in a supermarket. ▪ A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. Association Rules

Frequent Itemsets and Association rules Association Rules Market-Baskets-(2) Really a general many-many mapping(association) between two kinds of things But we ask about connections among items not baskets The technology focuses on common events, not rare events(long tail 4
Frequent Itemsets and Association Rules 4 Market-Baskets – (2) ▪ Really a general many-many mapping (association) between two kinds of things. ▪ But we ask about connections among “items,” not “baskets.” ▪ The technology focuses on common events, not rare events (“long tail”). Association Rules

Frequent Itemsets and Association rules Association Rules Association Rule Discovery Goal: To identify items that are bought together by sufficiently many customers and find dependencies among items WDD ltems Bread, Coke, Milk Rules discovered Beer Bread IMilk-->(Coke] Beer, Coke, Diaper, Milk [Diaper, Milk-->(Beer] Beer, Bread, Diaper, Milk 5 Coke, Diaper. Milk
Frequent Itemsets and Association Rules 5 Association Rule Discovery ▪ Goal: To identify items that are bought together by sufficiently many customers, and find dependencies among items Association Rules

Frequent Itemsets and Association rules Association Rules Support Simplest question: find sets of items that appear frequently in the baskets Support for itemset / the number of baskets containing all items in Sometimes given as a percentage Given a support threshold s, sets of items that appear in at least s baskets are called frequent Itemsets
Frequent Itemsets and Association Rules 6 Support ▪ Simplest question: find sets of items that appear “frequently” in the baskets. ▪ Support for itemset I = the number of baskets containing all items in I. ▪ Sometimes given as a percentage. ▪ Given a support threshold s, sets of items that appear in at least s baskets are called frequent itemsets. Association Rules

Frequent Itemsets and Association rules Association Rules Example: Frequent Itemsets Items=milk, coke, pepsi, beer, juice Support threshold =3 baskets B1={m,c,b} 2={m,p,j B3=/m,b] 4={c Am, b, by BIkE, b, jh B元c} Frequent itemsets: infer, (b), [il im, bi b, c, ic,j]
Frequent Itemsets and Association Rules 7 Example: Frequent Itemsets ▪ Items={milk, coke, pepsi, beer, juice}. ▪ Support threshold = 3 baskets. B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} ▪ Frequent itemsets: {m}, {c}, {b}, {j}, {m,b}, {b,c}, {c,j}. Association Rules

Frequent Itemsets and Association rules Association Rules Applications -(1) Items products; baskets sets of products someone bought in one trip to the store Example application: given that many people buy beer and diapers together: Run a sale on diapers; raise price of beer. Only useful if many buy diapers beer
Frequent Itemsets and Association Rules 8 Applications – (1) ▪ Items = products; baskets = sets of products someone bought in one trip to the store. ▪ Example application: given that many people buy beer and diapers together: ▪ Run a sale on diapers; raise price of beer. ▪ Only useful if many buy diapers & beer. Association Rules

Frequent Itemsets and Association rules Association Rules Applications -(2) Baskets sentences items documents containing those sentences Items that appear together too often could represent plagiarism Notice items do not have to be in" baskets
Frequent Itemsets and Association Rules 9 Applications – (2) ▪ Baskets = sentences; items = documents containing those sentences. ▪ Items that appear together too often could represent plagiarism. ▪ Notice items do not have to be “in” baskets. Association Rules

Frequent Itemsets and Association rules Association Rules Applications -3) a Baskets= Web pages; items= words Unusual words appearing together in a large number of documents, e. g,Brad and Angelina, may indicate an interesting relationship
Frequent Itemsets and Association Rules 10 Applications – (3) ▪ Baskets = Web pages; items = words. ▪ Unusual words appearing together in a large number of documents, e.g., “Brad” and “Angelina,” may indicate an interesting relationship. Association Rules
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 中国科学技术大学:《计算机编程入门》课程PPT教学课件(讲稿)An Introduction to Computer Programming.ppt
- 中国科学技术大学:《算法基础》课程教学资源(PPT课件讲稿)算法基础习题课(二).pptx
- 《计算机网络》课程教学资源(PPT课件讲稿)Chapter 04 网络层 Network Layer.ppt
- 东北大学:《可信计算基础》课程教学资源(PPT课件讲稿)第三讲 认证技术与数字签名.ppt
- Network and System Security Risk Assessment(PPT讲稿)Firewall.ppt
- 《计算模型与算法技术》课程教学资源(PPT讲稿)Chapter 8 Dynamic Programming.ppt
- 清华大学:图神经网络及其应用(PPT讲稿)Graph Neural Networks and Applications.pptx
- 《计算机网络》课程PPT教学课件(英文版)Chapter 4 物理层 PHYSICAL LAYER.pptx
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)Chapter 1 基本概念和算法分析.ppt
- 安徽理工大学:《算法导论》课程教学资源(PPT课件讲稿)第4章 分治法——“分”而治之.ppt
- 南京大学:《形式语言与自动机 Formal Languages and Automata》课程教学资源(PPT课件讲稿)Transition System(主讲:卜磊).pptx
- 南京大学:《编译原理》课程教学资源(PPT课件讲稿)第四章 语法分析.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第四章 网络层.pptx
- 《ASP动态网页设计实用教程》教学资源(PPT课件讲稿)第3章 Web页面制作基础.ppt
- 《编译原理》课程教学资源(PPT课件讲稿)第四章 语法制导的翻译.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)顺序同一性的存储器模型.pptx
- 马尔可夫链蒙特卡洛算法(PPT讲稿)Hamiltonian Monte Carlo on Manifolds,HMC.pptx
- SOFT COMPUTING Evolutionary Computing(PPT讲稿).ppt
- 《计算机情报检索原理》课程教学资源(PPT课件)第五章 自动标引.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)Chapter 04 网络层 Network Layer.ppt
- 《Computer Networking:A Top Down Approach》英文教材教学资源(PPT课件讲稿,6th edition)Chapter 3 传输层 Transport Layer.ppt
- 分布式数据库系统的体系结构与设计(PPT讲稿)Architecture and Design of Distributed Database Systems.pptx
- 南京大学:Conceptual Architecture View(PPT讲稿).ppt
- 北京师范大学:《计算机应用基础》课程教学资源(PPT课件讲稿)第1章 计算机常识(主讲:马秀麟).pptx
- 《编译原理》课程教学资源(PPT课件讲稿)中间代码生成.pptx
- TTCN3工具培训(PPT讲稿)TTCN-3简介.ppt
- 《Java Web编程技术》课程教学资源(PPT课件讲稿)第4章 JDBC数据库访问技术.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)第三章 流水线技术.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第2章 物理层.ppt
- 《计算机视觉》课程教学资源(PPT课件讲稿)基于灭点几何的深度图重建、基于焦点变换的深度图重建.ppt
- 中国科学技术大学:《数据结构及其算法》课程电子教案(PPT课件讲稿)第七章 图.pps
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)第4章 存储层次结构设计.pptx
- 大连工业大学:《计算机文化与软件基础》课程教学资源(PPT课件讲稿)绪论、计算机系统的组成、计算机中数的表示.pps
- 西安电子科技大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第一章 数制与码制(主讲:王晓甜).pptx
- 网络应用软件(PPT课件讲稿)第一讲 客户-服务器概念、协议端口的使用、套接字API.ppt
- 《编译原理》课程教学资源(PPT课件讲稿)代码优化——全局数据流分析技术.ppt
- 《编码理论》课程电子教案(PPT课件讲稿)第二章 信息量和熵.ppt
- 计算机网络 The Network Layer(PPT课件讲稿)网络互联、Internet上的网络层.ppt
- 分布式数据库(PPT课件讲稿)Distributed DBMS Architecture.ppt
- 同济大学:企业电子商务系统(PPT讲稿)Enterprise Electronic Business Systems.ppt