中国高校课件下载中心 》 教学资源 》 大学文库

大数据集成(PPT讲稿)Big Data Integration

文档信息
资源类别:文库
文档格式:PPTX
文档页数:109
文件大小:9.87MB
团购合买:点击进入团购
内容简介
 Motivation – Why do we need big data integration? – How has “small” data integration been done? – Challenges in big data integration  Schema alignment  Record linkage  Data fusion  Emerging topics
刷新页面文档预览

Big Data Integration Xin Luna Dong Google Inc) Divesh Srivastava(AT&T Labs-Research)

Big Data Integration Xin Luna Dong (Google Inc.) Divesh Srivastava (AT&T Labs-Research)

What is“ Big data Integration?” o Big data integration= Big data+ data integration Data integration: easy access to multiple data sources[DH[12 Virtual: mediated schema, query reformulation, link fuse answers Warehouse: materialized data, easy querying, consistency issues ◆ Big data: all about the v Size: large volume of data, collected and analyzed at high velocity Complexity huge variety of data, of questionable veracity Utility: data of considerable value

What is “Big Data Integration?”  Big data integration = Big data + data integration  Data integration: easy access to multiple data sources [DHI12] – Virtual: mediated schema, query reformulation, link + fuse answers – Warehouse: materialized data, easy querying, consistency issues  Big data: all about the V’s ☺ – Size: large volume of data, collected and analyzed at high velocity – Complexity: huge variety of data, of questionable veracity – Utility: data of considerable value 2

What is“ Big data Integration?” o Big data integration= Big data+ data integration Data integration: easy access to multiple data sources[DH[12 Virtual: mediated schema, query reformulation, link fuse answers Warehouse: materialized data, easy querying, consistency issues Big data in the context of data integration: still about the v's g Size: large volume of sources, changing at high velocity Complexity huge variety of sources, of questionable veracity Utility: sources of considerable value

What is “Big Data Integration?”  Big data integration = Big data + data integration  Data integration: easy access to multiple data sources [DHI12] – Virtual: mediated schema, query reformulation, link + fuse answers – Warehouse: materialized data, easy querying, consistency issues  Big data in the context of data integration: still about the V’s ☺ – Size: large volume of sources, changing at high velocity – Complexity: huge variety of sources, of questionable veracity – Utility: sources of considerable value 3

Outline ◆ Motivation Why do we need big data integration? How has"small"data integration been done? Challenges in big data integration ◆ Schema alignment ◆ Record linkage ◆ Data fusion ◆ merging topICs

Outline  Motivation – Why do we need big data integration? – How has “small” data integration been done? – Challenges in big data integration  Schema alignment  Record linkage  Data fusion  Emerging topics 4

Why do We need"Big Data Integration? Building web-scale knowledge bases ProBase MSR knowledge base A Little Knowledge Goes a Long Way Google knowledge graph 产 Freebase Doman Topics Facts 24M161M aGO ct knowledge Meda common

Why Do We Need “Big Data Integration?”  Building web-scale knowledge bases 5 Google knowledge graph MSR knowledge base A Little Knowledge Goes a Long Way. NELL

Why do We need"Big Data Integration? Reasoning over linked data N m①②

Why Do We Need “Big Data Integration?”  Reasoning over linked data 6

Why do We need"Big Data Integration? Geo-spatial data fusion ident Data Cnme Data SARS atellite Analytic Critica Hazard Data Geospatial Data Fusion http://axiomamuse.wordpress.com/2011/04/18/ 7

Why Do We Need “Big Data Integration?”  Geo-spatial data fusion 7 http://axiomamuse.wordpress.com/2011/04/18/

Why do We need"Big Data Integration? Scientific data analysis Genes genotypes Disease Models Expression C圆 Recombinases(cre) Function Pathways Strains/SNPs Orthology Tumors chiE 310 http://scienceline.org/2012/01/from-index-cards-to-information-overload/

Why Do We Need “Big Data Integration?”  Scientific data analysis 8 http://scienceline.org/2012/01/from-index-cards-to-information-overload/

Outline ◆ Motivation Why do we need big data integration? How has"small"data integration been done? Challenges in big data integration ◆ Schema alignment ◆ Record linkage ◆ Data fusion ◆ merging topICs

Outline  Motivation – Why do we need big data integration? – How has “small” data integration been done? – Challenges in big data integration  Schema alignment  Record linkage  Data fusion  Emerging topics 9

Small Data Integration: What Is It? Data integration solving lots of jigsaw puzzles Each jigsaw puzzle e. g, Ta j mahal) is an integrated entity Each piece of a puzzle comes from some source Small data integration solving small puzzles

“Small” Data Integration: What Is It?  Data integration = solving lots of jigsaw puzzles – Each jigsaw puzzle (e.g., Taj Mahal) is an integrated entity – Each piece of a puzzle comes from some source – Small data integration → solving small puzzles 10

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档