同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Platforms for Big Data Mining(主讲:饶卫雄)

Big Data Analysis and Mining Weixiong rao饶卫雄 Tongji University同济大学软件学院 2015Fl wxrao@tongji.edu.cn Some of the slides are from dr Jure Leskovec's and prof. Zachary g. lves 2021/1/30 同济大学软件学院
2021/1/30 1 Big Data Analysis and Mining Weixiong Rao 饶卫雄 Tongji University 同济大学软件学院 2015 Fall wxrao@tongji.edu.cn *Some of the slides are from Dr Jure Leskovec’s and Prof. Zachary G. Ives

Traditional DAM Oracle DB IBM DW product on Operational very powerful servers ETL SAP ERP ERP Exraction Transformation Loading Salesforce CRM Raw data ■■■口■■■ Olap Analysis Reporting Data Warehouse Flat files from Flat Data Mining Legancy System Files (C)2008 datawarehouse 4u. info DAM tools 2021/1/30 同济大学软件学院
2021/1/30 5 Traditional DAM Oracle DB SAP ERP Salesforce CRM Flat Files from Legancy System IBM DW product on very powerful servers DAM tools

Big data a Typical large enterprise .5,000-50,000 servers, Terabytes of data, millions of Txn per day In contrast, many Internet companies o Millions of servers, petabytes of data Google o Lots and lots of Web pages a Billions of Google queries per day ◆ Facebook: d abillion facebook users n Billion+ Facebook pages Twitter a hundreds of million twitter accounts n Hundreds of million Tweets per day 2021/1/30 同济大学软件学院 6
2021/1/30 6 Big Data ◼ Typical large enterprise: ◆ 5,000-50,000 servers, Terabytes of data, millions of Txn per day. ◼ In contrast, many Internet companies ◆ Millions of servers, petabytes of data ◆ Google: Lots and lots of Web pages Billions of Google queries per day ◆ Facebook: A billion Facebook users Billion+ Facebook pages ◆ Twitter: Hundreds of million Twitter accounts Hundreds of million Tweets per day

Nowsdays DAM solutions a Google, Facebook, LinkedIn, eBay, Amazon didnot use the traditional data warehouse products for dAM a Why? CAP theorem Different assumptions lead to different solutions a What? ◆ Massive parallism a Hadoop Map Reduce paradigm rhade a UC Berkeley shark/spark Soar k Lightning-fast cluster comput 2021/1/30 同济大学软件学院
2021/1/30 7 Nowsdays DAM solutions ◼ Google, Facebook, LinkedIn, eBay, Amazon... didnot use the traditional data warehouse products for DAM. ◼ Why? CAP theorem ◆ Different assumptions lead to different solutions ◼ What? ◆ Massive parallism Hadoop MapReduce paradigm UC Berkeley shark/spark

What's DAM? Analysis of data is a process of inspecting cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making a Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes 2021/1/30 同济大学软件学院
2021/1/30 8 What’ s DAM? ◼ Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. ◼ Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes

What's big dAM? Big data is the term for a collection of data sets so large and complex that it becomes dificult to process using on-hand database management tools or traditional data processing applications The challenges include capture, curation, storage search sharing, transfer, analysis and visualization a Our course: How to do daM in the Big data context Data Mining≈ Predictive Analytics≈ Data Science≈ Business Intelligence ◆ Big data mining≈ Massive data analysis 2021/1/30 同济大学软件学院
2021/1/30 9 What’s big DAM? ◼ Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. ◆ The challenges include capture, curation, storage search, sharing, transfer, analysis and visualization ◼ Our course: How to do DAM in the Big data context ◆ Data Mining ≈ Predictive Analytics ≈Data Science ≈ Business Intelligence ◆ Big data mining ≈ Massive data analysis

Let's focus on big DAM what matters when dealing with data? Challenges Usage Context Streaming Scalability Collect Data Modalities Reason Data Operators 2021/1/30 同济大学软件学院
2021/1/30 10 Let’s focus on big DAM -what matters when dealing with data?

Let's focus on big DAM cultures of data minging a Data mining overlaps with Databases: Large-scale data, simple queries Machine learning: Small data complex models CS Theory:(Randomized) Algorithms Statistics Machine Learning a Different cultures: To a DB person, data mining is an extreme Data Mining form of analytic processing -queries that examine large amounts of data Database n Result is the query answer o to a ml person data-mining is the inference of models a Result is the parameters of the mode 2021/1/30 同济大学软件学院 11
2021/1/30 11 Let’s focus on big DAM - cultures of data minging? ◼ Data mining overlaps with: ◆ Databases: Large-scale data, simple queries ◆ Machine learning: Small data, Complex models ◆ CS Theory: (Randomized) Algorithms ◼ Different cultures: ◆ To a DB person, data mining is an extreme form of analytic processing – queries that examine large amounts of data Result is the query answer ◆ To a ML person, data-mining is the inference of models Result is the parameters of the model

Let's focus on big data mining a This class overlaps with machine learning, statistics artificial intelligence databases but more stress on ◆ Scalability( big data) ◆ Algorithms o Computing architectures Sti atistIcs Machine o Automation for handling real big data Learning the required background Data Mining Data structure and algorithm design o Probability and linear algebra stems ◆ Operating system ◆ Java program design 2021/1/30 同济大学软件学院
2021/1/30 12 Let’s focus on big data mining ◼ This class overlaps with machine learning, statistics, artificial intelligence, databases but more stress on ◆ Scalability (big data) ◆ Algorithms ◆ Computing architectures ◆ Automation for handling real big data ◼ The required background ◆ Data structure and Algorithm design ◆ Probability and Linear algebra ◆ Operating System ◆ Java program design

What will we learn? a We will learn to mine different types of data: ◆ Data is high dim yonal ◆ Data is a graph *Data-is infinite/never-ending Data is labeled a We will learn to use different models of computation: ◆ Matlab+ Hadoop+ Spark e Streams and online algorith o Single machine in-memory 2021/1/30 同济大学软件学院
2021/1/30 13 What will we learn? ◼ We will learn to mine different types of data: ◆ Data is high dimensional ◆ Data is a graph ◆ Data is infinite/never-ending ◆ Data is labeled ◼ We will learn to use different models of computation: ◆ Matlab + Hadoop + Spark ◆ Streams and online algorithms ◆ Single machine in-memory
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《数据结构》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 《单片机应用技术》课程PPT教学课件(C语言版)第3章 MCS-51指令系统及汇编程序设计.ppt
- 《编译原理与技术》课程教学资源(PPT课件讲稿)代码优化.ppt
- Progress of Concurrent Objects with Partial Methods.pptx
- 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 12 Language Models.ppt
- 四川大学:《操作系统 Operating System》课程教学资源(PPT课件讲稿)Chapter 6 Concurrency - Deadlock(死锁)and Starvation(饥饿).ppt
- 《操作系统》课程教学资源(PPT课件讲稿)实时调度 Real-Time Scheduling.ppt
- 白城师范学院:《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第二章 关系数据库(2.1-2.3).ppt
- 《计算机算法设计与分析》课程教学资源(PPT课件)第8章回溯法.ppt
- 清华大学出版社:《计算机应用基础实例教程》课程教学资源(PPT课件讲稿,第二版,共七章,主编:吴霞,制作:李晓新).ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)绪论、第1章 量化设计与分析基础(主讲:周学海).ppt
- 北京大学:烟花算法的变异算子(PPT讲稿)Mutation Operators of Fireworks Algorithm.pptx
- Introduction to Text Mining 文本挖掘.pptx
- 《Managing XML and Semistructured Data》教学资源(PPT课件讲稿)Part 04 Compressing XML Data.ppt
- 《JAVA面向对象入门技术》教程教学资源(PPT课件讲稿)第二章 Java语言基础.ppt
- 北京大学:《项目成本管理》课程教学资源(PPT课件讲稿)项目范围计划(主讲:周立新).ppt
- 山东大学:《网站设计与建设》课程教学资源(PPT课件讲稿)第三部分 网站设计技术 第20章 MySQL数据库.ppt
- 程序设计工具(PPT课件讲稿)Software Program Tool.ppt
- 《Java Web应用开发技术与案例教程》教学资源(PPT讲稿)第7章 Java Web常用开发模式与案例.ppt
- 《面向对象程序设计》课程教学大纲(适用专业:信息与计算科学).pdf
- 《计算机网络》课程教学资源(PPT讲稿)网络安全(访问控制、加密、防火墙).ppt
- 水平集方法与图像分割 Level set method and image segmentation.pptx
- 北京师范大学:《计算机文化基础》课程教学资源(PPT课件讲稿)08 网页制作基础知识(赵国庆).ppt
- 《C语言程序设计》课程教学资源(PPT讲稿)第1章 程序设计和C语言.pptx
- 《计算机组装与维护》课程教学资源(PPT课件讲稿)第十一章 计算机数据恢复技术.ppt
- 贵州大学:计算机应用基础(PPT课件讲稿)计算机基础知识.pdf
- 《计算导论与程序设计》课程教学资源(PPT课件讲稿)Chap 5 函数.ppt
- 《计算机网络 Computer Networking》课程教学资源(PPT课件讲稿)Chapter 08 Network Security.ppt
- 《计算机网络与通信》课程教学资源(PPT课件)Chapter 8 传输层.ppt
- 《数据结构与算法分析》课程教学资源(PPT讲稿)Lists, Stacks and Queues.ppt
- 沈阳理工大学:《Visual Basic 6.0程序设计》课程教学资源(PPT课件讲稿)第三章 VB基本语言.ppt
- 南京大学:《计算机网络 Computer Networks》课程教学资源(PPT课件讲稿)简介、第一章 引论(谭晓阳).ppt
- 中国科学技术大学:《Linux操作系统分析》课程教学资源(PPT课件讲稿)第一章 绪论(主讲:陈香兰).ppt
- 西华大学:《电子商务概论》课程教学资源(PPT课件讲稿)第4章 电子商务的安全问题.ppt
- 北京大学:未来互联网体系结构(PPT讲稿)Future Internet Architecture(Introduction).pptx
- 《计算机组成原理》课程教学资源(PPT课件讲稿)第5章 输入输出系统.ppt
- 清华大学出版社:《物流电子商务》课程教学资源(PPT课件讲稿,共八章,主编:董铁,制作:李晓新).ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第三章 数据链路层.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第四章 网络层.ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)契约式设计 Design by Contract.ppt