中国高校课件下载中心 》 教学资源 》 大学文库

香港科技大学:Record Linkage for Big Data

文档信息
资源类别:文库
文档格式:PPTX
文档页数:84
文件大小:1.86MB
团购合买:点击进入团购
内容简介
香港科技大学:Record Linkage for Big Data
刷新页面文档预览

Record linkage for big data Slides from Luna Dongs VLDB Tutoria

Record Linkage for Big Data Slides from Luna Dong’s VLDB Tutorial 1

Record Linkage Matching based on identifying content: color, pattern

Record Linkage  Matching based on identifying content: color, pattern 2

Record linkage Matching based on identifying content: color, pattern

Record Linkage  Matching based on identifying content: color, pattern 3

Record Linkage: Three Steps [ElVO7, GMi2] Record linkage blocking+ pairwise matching+ clustering Scalability, similarity semantics Blocking Pairwise Matching Clustering

Record Linkage: Three Steps [EIV07, GM12]  Record linkage: blocking + pairwise matching + clustering – Scalability, similarity, semantics 4 Blocking Pairwise Matching Clustering

Record linkage: Three Steps Blocking: efficiently create small blocks of similar records Ensures scalability Blocking 学事 Pairwise Matching Clustering

Record Linkage: Three Steps  Blocking: efficiently create small blocks of similar records – Ensures scalability 5 Blocking Pairwise Matching Clustering

Record linkage: Three Steps Pairwise matching: compares all record pairs in a block Computes similarity Blocking Pairwise Matching Clustering

Record Linkage: Three Steps  Pairwise matching: compares all record pairs in a block – Computes similarity 6 Blocking Pairwise Matching Clustering

Record linkage: Three steps Clustering: groups sets of records into entities Ensures semantics Blocking 事 Pairwise Matching Clustering

Record Linkage: Three Steps  Clustering: groups sets of records into entities – Ensures semantics 7 Blocking Pairwise Matching Clustering

BDI: Record Linkage 4 Volume: dealing with billions of records Map-reduce based record linkage [vcl10, KTr12 Adaptive record blocking [DNS+12, MKB12, VN12 Blocking in heterogeneous data spaces [Plp+12, PKP+13] ◆ Velocity Incremental record linkage [wgm10, WGM13

BDI: Record Linkage  Volume: dealing with billions of records – Map-reduce based record linkage [VCL10, KTR12] – Adaptive record blocking [DNS+12, MKB12, VN12] – Blocking in heterogeneous data spaces [PIP+12, PKP+13]  Velocity – Incremental record linkage [WGM10, WGM13] 8

BDI: Record Linkage ◆ variety Matching structured and unstructured data [KGA+11, KTT+12 Matching Web tables and catalogs [lsc10 ◆ Veracity Linking temporal records [ldm+11

BDI: Record Linkage  Variety – Matching structured and unstructured data [KGA+11, KTT+12] – Matching Web tables and catalogs [LSC10]  Veracity – Linking temporal records [LDM+11] 9

Matching with Unstructured Data Matching product offers: 1000s of stores, millions of products Product offers are terse, unstructured text Many similar but different product offers Panasonic Lumix DMC-SZ3 16 1 MP Digital camera -Black Other style options: Violet($124)White($125) Panasonic Lumix-Point Shoot-161 megapixel- Compact Sensor -CCD optical zoom -SD Card-Built-in Flash-39 ounce-ISo 6, 400 a Add to Shortlist Panasonic Lumix DMC-ZS25 16.1 MP Digital camera-SilverC Other style options: Black ($225 Panasonic Lumix- Point Shoot- 16.1 megapixel- Compact Sensor R Add toshertli Panasonic Lumix DMC-ZS8 14.1 MP Digital camera-Blackv Other style options: Silver($200) Panasonic Lumix-Point& Shoot-141 megapixel -Compact Sensor -16x optical zoom-SD Card-Built-in Flash- 6.6 ounce-Iso 6,400 2 ★★★到 a Add to shortlist

Matching with Unstructured Data  Matching product offers: 1000s of stores, millions of products – Product offers are terse, unstructured text – Many similar but different product offers 10

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档