《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 10 Big Data

Motivation Very large volumes of data being collected Driven by growth of web,social media,and more recently internet-of- things Web logs were an early source of data Analytics on web logs has great value for advertisements,web site structuring,what posts to show to a user,etc ■ Big Data:differentiated from data handled by earlier generation databases Volume:much larger amounts of data stored Velocity:much higher rates of insertions Variety:many types of data,beyond relational data Database System Concepts-7th Edition 10.2 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.2 ©Silberschatz, Korth and Sudarshan th Edition Motivation ▪ Very large volumes of data being collected • Driven by growth of web, social media, and more recently internet-ofthings • Web logs were an early source of data ▪ Analytics on web logs has great value for advertisements, web site structuring, what posts to show to a user, etc ▪ Big Data: differentiated from data handled by earlier generation databases • Volume: much larger amounts of data stored • Velocity: much higher rates of insertions • Variety: many types of data, beyond relational data

Querying Big Data Transaction processing systems that need very high scalability Many applications willing to sacrifice ACID properties and other database features,if they can get very high scalability Query processing systems that Need very high scalability,and Need to support non-relation data Database System Concepts-7th Edition 10.3 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.3 ©Silberschatz, Korth and Sudarshan th Edition Querying Big Data ▪ Transaction processing systems that need very high scalability • Many applications willing to sacrifice ACID properties and other database features, if they can get very high scalability ▪ Query processing systems that • Need very high scalability, and • Need to support non-relation data

Big Data Storage Systems Distributed file systems Shardring across multiple databases Key-value storage systems Parallel and distributed databases Database System Concepts-7th Edition 10.4 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.4 ©Silberschatz, Korth and Sudarshan th Edition Big Data Storage Systems ▪ Distributed file systems ▪ Shardring across multiple databases ▪ Key-value storage systems ▪ Parallel and distributed databases

Distributed File Systems A distributed file system stores data across a large collection of machines, but provides single file-system view Highly scalable distributed file system for large data-intensive applications. E.g.,10K nodes,100 million files,10 PB Provides redundant storage of massive amounts of data on cheap and unreliable computers Files are replicated to handle hardware failure Detect failures and recovers from them Examples: Google File System(GFS) Hadoop File System (HDFS) Database System Concepts-7th Edition 10.5 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.5 ©Silberschatz, Korth and Sudarshan th Edition Distributed File Systems ▪ A distributed file system stores data across a large collection of machines, but provides single file-system view ▪ Highly scalable distributed file system for large data-intensive applications. • E.g., 10K nodes, 100 million files, 10 PB ▪ Provides redundant storage of massive amounts of data on cheap and unreliable computers • Files are replicated to handle hardware failure • Detect failures and recovers from them ▪ Examples: • Google File System (GFS) • Hadoop File System (HDFS)

Hadoop File System Architecture Single Namespace for entire NameNode Metadata(name,replicas,..) cluster Metadata Ops Files are broken up into BackupNode blocks Metadata(name,replicas..) Client Typically 64 MB block size Block Read DataNodes Each block replicated on multiple DataNodes L Blocks Client 。 Finds location of blocks Client from NameNode Block Write Accesses data directly Replication from DataNode Rack 1 Rack 2 Database System Concepts-7th Edition 10.6 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 10.6 ©Silberschatz, Korth and Sudarshan th Edition Hadoop File System Architecture ▪ Single Namespace for entire cluster ▪ Files are broken up into blocks • Typically 64 MB block size • Each block replicated on multiple DataNodes ▪ Client • Finds location of blocks from NameNode • Accesses data directly from DataNode

Hadoop Distributed File System(HDFS) NameNode Maps a filename to list of Block IDs Maps each Block ID to DataNodes containing a replica of the block DataNode:Maps a Block ID to a physical location on disk Data Coherency Write-once-read-many access model Client can only append to existing files Distributed file systems good for millions of large files But have very high overheads and poor performance with billions of smaller tuples Database System Concepts-7th Edition 10.7 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.7 ©Silberschatz, Korth and Sudarshan th Edition Hadoop Distributed File System (HDFS) ▪ NameNode • Maps a filename to list of Block IDs • Maps each Block ID to DataNodes containing a replica of the block ▪ DataNode: Maps a Block ID to a physical location on disk ▪ Data Coherency • Write-once-read-many access model • Client can only append to existing files ▪ Distributed file systems good for millions of large files • But have very high overheads and poor performance with billions of smaller tuples

Sharding Sharding:partition data across multiple databases Partitioning usually done on some partitioning attributes(also known as partitioning keys or shard keys e.g.user ID E.g.,records with key values from 1 to 100,000 on database 1, records with key values from 100,001 to 200,000 on database 2,etc. Application must track which records are on which database and send queries/updates to that database Positives:scales well,easy to implement Drawbacks: Not transparent:application has to deal with routing of queries, queries that span multiple databases When a database is overloaded,moving part of its load out is not easy Chance of failure more with more databases need to keep replicas to ensure availability,which is more work for application Database System Concepts-7th Edition 10.8 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 10.8 ©Silberschatz, Korth and Sudarshan th Edition Sharding ▪ Sharding: partition data across multiple databases ▪ Partitioning usually done on some partitioning attributes (also known as partitioning keys or shard keys e.g. user ID • E.g., records with key values from 1 to 100,000 on database 1, records with key values from 100,001 to 200,000 on database 2, etc. ▪ Application must track which records are on which database and send queries/updates to that database ▪ Positives: scales well, easy to implement ▪ Drawbacks: • Not transparent: application has to deal with routing of queries, queries that span multiple databases • When a database is overloaded, moving part of its load out is not easy • Chance of failure more with more databases ▪ need to keep replicas to ensure availability, which is more work for application

Key Value Storage Systems Key-value storage systems store large numbers(billions or even more)of small (KB-MB)sized records Records are partitioned across multiple machines and Queries are routed by the system to appropriate machine Records are also replicated across multiple machines,to ensure availability even if a machine fails Key-value stores ensure that updates are applied to all replicas,to ensure that their values are consistent Database System Concepts-7th Edition 10.10 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 10.10 ©Silberschatz, Korth and Sudarshan th Edition Key Value Storage Systems ▪ Key-value storage systems store large numbers (billions or even more) of small (KB-MB) sized records ▪ Records are partitioned across multiple machines and ▪ Queries are routed by the system to appropriate machine ▪ Records are also replicated across multiple machines, to ensure availability even if a machine fails • Key-value stores ensure that updates are applied to all replicas, to ensure that their values are consistent

Key Value Storage Systems Key-value stores may store uninterpreted bytes,with an associated key E.g.,Amazon S3,Amazon Dynamo Wide-table(can have arbitrarily many attribute names)with associated key Google BigTable,Apache Cassandra,Apache Hbase, Amazon DynamoDB Allows some operations(e.g.,filtering)to execute on storage node ·JSON MongoDB,CouchDB(document model) Document stores store semi-structured data,typically JSON Some key-value stores support multiple versions of data,with timestamps/version numbers Database System Concepts-7th Edition 10.11 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.11 ©Silberschatz, Korth and Sudarshan th Edition Key Value Storage Systems ▪ Key-value stores may store • uninterpreted bytes, with an associated key ▪ E.g., Amazon S3, Amazon Dynamo • Wide-table (can have arbitrarily many attribute names) with associated key • Google BigTable, Apache Cassandra, Apache Hbase, Amazon DynamoDB • Allows some operations (e.g., filtering) to execute on storage node • JSON ▪ MongoDB, CouchDB (document model) ▪ Document stores store semi-structured data, typically JSON ▪ Some key-value stores support multiple versions of data, with timestamps/version numbers

Data Representation An example of a JSON object is: { "ID":"22222", "name": "firstname:"Albert", "lastname:"Einstein" }, "deptname":"Physics", "children": {"firstname":"Hans","lastname":"Einstein"}, "firstname":"Eduard","lastname":"Einstein"} } Database System Concepts-7th Edition 10.12 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 10.12 ©Silberschatz, Korth and Sudarshan th Edition Data Representation ▪ An example of a JSON object is: { "ID": "22222", "name": { "firstname: "Albert", "lastname: "Einstein" }, "deptname": "Physics", "children": [ { "firstname": "Hans", "lastname": "Einstein" }, { "firstname": "Eduard", "lastname": "Einstein" } ] }
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 1 Introduction(Avi Silberschatz Henry F. Korth S. Sudarshan).pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 9 Object-Based Databases.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 8 Application Design and Development.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 7 Relational Database Design.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 6 Entity-Relationship Model.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 5 Other Relational Languages.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 4 Advanced SQL.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 3 SQL.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter Advanced Transaction Processing.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 24 Advanced Data Types.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 23 Advanced Application Development.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 22 Distributed Databases.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 21 Parallel Databases.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 20 Database System Architectures.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 2 Relational Model.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 19 Information Retrieval.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 18 Data Analysis and Mining.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 17 Recovery System.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 16 Concurrency Control.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第五版,PPT课件讲稿,英文版)Chapter 15 Transactions.ppt
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 11 Data Analytics.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 12 Physical Storage Systems.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 13 Data Storage Structures.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 14 Indexing.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 15 Query Processing.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 16 Query Optimization.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 17 Transactions.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 18 Concurrency Control.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 19 Recovery System.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 2 Intro to Relational Model.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 20 Database System Architectures.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 21 Parallel and Distributed Storage.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 22 Parallel and Distributed Query Processing.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 23 Parallel and Distributed Transaction Processing.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 24 Advanced Indexing.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 25 Advanced Application Development.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 26 Blockchain Databases.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 27 Formal-Relational Query Languages.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 28 Advanced Relational Database Design.pptx
- 《数据库系统概念 Database System Concepts》原书教学资源(第七版,PPT课件讲稿,英文版)Chapter 29 Object-Based Databases.pptx