香港理工大学:数据仓库和数据挖掘(PPT讲稿)Data Warehousing & Data Mining

COMP 578 Data Warehousing Data Mining Keith C.C. han Department of Computing The Hong Kong Polytechnic University
COMP 578 Data Warehousing & Data Mining Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University

Text and references Chan, K.C. C, Course Notes on Data Mining Data Warehousing, Department of Computing The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, 2003 Inmon, W.H., Building the Data Warehouse, 2nd Edition, J. Wiley sons, New York, NY, 1996 Whitehorn, M, Business Intelligence: the IBM Solution: Datawarehousing and OLAP Springer, London, 1999. Han, J, and Kamber, M. Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, 2001 O P. Rud, Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management, J. Wiley, New York, NY, 2001 Groth, R, Data Mining: Building Competitive Advantage, Prentice Hall, Upper Saddle River, NJ,1998 Kovalerchuk, B, Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic, Boston 2000 Berry, MJ.A, Mastering Data Mining: the Art and Science of Customer Relationship Management, Wilery, New York NY, 2000 Berry, M.J. A Data Mining Techniques for Marketing, Sales and Customer Support, Wilery New York NY, 1997 Mattison, R, Data Warehousing and Data Mining for Telecommunications, Artech House Boston, 1997
5 Text and References • Chan, K.C.C., Course Notes on Data Mining & Data Warehousing, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, 2003. • Inmon, W.H., Building the Data Warehouse, 2 nd Edition, J. Wliley & Sons, New York, NY, 1996. • Whitehorn, M., Business Intelligence: the IBM Solution: Datawarehousing and OLAP, Springer, London, 1999. • Han, J., and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, 2001. • O.P. Rud, Data Mining Cookbook: Modeling Data for Marketing, Risk, and Customer Relationship Management, J. Wiley, New York, NY, 2001. • Groth, R., Data Mining: Building Competitive Advantage, Prentice Hall, Upper Saddle River, NJ, 1998. • Kovalerchuk, B., Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic, Boston, 2000. • Berry, M.J.A., Mastering Data Mining: the Art and Science of Customer Relationship Management, Wilery, New York NY, 2000. • Berry, M.J.A., Data Mining Techniques for Marketing, Sales and Customer Support, Wilery, New York NY, 1997. • Mattison, R., Data Warehousing and Data Mining for Telecommunications, Artech House, Boston, 1997

Course Outline (1) Data Mining From data warehousing to data mining Data pre-processing and data mining life-cycle Association and sequence analysis classification and clustering Fuzzy Logic, Neural Networks, and Genetic Algorithms Mining Complex Data OLAP mining; spatial data mining; text mining time-series data mining; web mining; visual data mining
6 Course Outline (1) • Data Mining – From data warehousing to data mining. – Data pre-processing and data mining life-cycle. – Association and sequence analysis; classification and clustering. – Fuzzy Logic, Neural Networks, and Genetic Algorithms. – Mining Complex Data. • OLAP mining; spatial data mining; text mining; time-series data mining; web mining; visual data mining

Course Outline(2) ° Data warehousing Introduction; basic concepts of data warehousing; data warehouse VS. Operational DB, data warehouse and the industry Architecture and design; two-tier and three tier architecture, star schema and snowflake schema, data capturing, replication, transformation and cleansing Data characteristics metadata static and dynamic data; derived data Data Marts; OLAP, data mining, data Warehouse administration
7 Course Outline (2) • Data warehousing. – Introduction; basic concepts of data warehousing; data warehouse vs. Operational DB; data warehouse and the industry. – Architecture and design; two-tier and threetier architecture; star schema and snowflake schema; data capturing, replication, transformation and cleansing. – Data characteristics; metadata; static and dynamic data; derived data. – Data Marts; OLAP; data mining; data warehouse administration

Aims and objectives The hype about data姗器版 CUSTOMER REL ATIONEHIF MANAGEMENT warehousing and Analytics and the Data Warehouse data mining o Better understand tools by IBM, IT solutions meet Microsoft oracle marketers goals SAS, SPSS Job mobility and prospects. Projects and research thesis
8 Aims and Objectives • The hype about data warehousing and data mining. • Better understand tools by IBM, Microsoft, Oracle, SAS, SPSS. • Job mobility and prospects. • Projects and research thesis

Data Warehousing and Industry One of the hottest topic in IS Over 90% of larger companies either have a DW or are starting one Warehousing is big business $2 billion in 1995 $3.5 billion in early 1997 $8 billion in 1998 [Metagroupl over $200 billion over next 5 years
9 Data Warehousing and Industry • One of the hottest topic in IS. • Over 90% of larger companies either have a DW or are starting one. • Warehousing is big business – $2 billion in 1995 – $3.5 billion in early 1997 – $8 billion in 1998 [Metagroup] – over $200 billion over next 5 years

Data Warehousing and Industry(2) A 1996 study of 62 data warehousing projects showed An average return on investment of 321% with an average payback period of 2.73 years WalMart has largest warehouse 900-CPU, 2,700 disk, 23 TB Teradata system NTTB in warehouse 40-50GB per day 10
10 Data Warehousing and Industry (2) • A 1996 study of 62 data warehousing projects showed: – An average return on investment of 321%, with an average payback period of 2.73 years. • WalMart has largest warehouse – 900-CPU, 2,700 disk, 23 TB Teradata system – ~7TB in warehouse – 40-50GB per day

What is a data Warehouse? Defined in many different ways non-rigorously A DB for decision support Maintained separately from an organizations operational database a data warehouse is a subjiect-oriented integrated time-variant, and nonvolatile collection of data in support of management's decision-making process.-- W.H. Inmon o Data warehousing The process of constructing and using data warehouses
11 What is a Data Warehouse? • Defined in many different ways non-rigorously. – A DB for decision support. – Maintained separately from an organization’s operational database. • A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.— W. H. Inmon • Data warehousing: – The process of constructing and using data warehouses

Why Data Warehousing? Advance of information technology Data collected in huge amounts Need to make good use of data? Architecture and tools to Bring together scattered information from multiple sources to provide consistent data source for decision support. Support information processing by providing a solid platform of consolidated, historical data for analysis
12 Why Data Warehousing? • Advance of information technology. • Data collected in huge amounts. • Need to make good use of data? • Architecture and tools to – Bring together scattered information from multiple sources to provide consistent data source for decision support. – Support information processing by providing a solid platform of consolidated, historical data for analysis

Why Data Mining? Data explosion problem Automated data collection tools and mature database technology Leading to tremendous amounts of data stored in databases, data warehouses and other information repositories o We are drowning in data, but starving for knowledge
13 Why Data Mining? • Data explosion problem: – Automated data collection tools and mature database technology. – Leading to tremendous amounts of data stored in databases, data warehouses and other information repositories. • We are drowning in data, but starving for knowledge!
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《信息系统与数据库技术》课程教学资源(PPT课件讲稿)第4章 T-SQL与可编程对象.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第三章 数据链路层.ppt
- 北京航空航天大学:《数据挖掘——概念和技术(Data Mining - Concepts and Techniques)》课程教学资源(PPT课件讲稿)Chapter 02 Getting to Know Your Data.ppt
- 《Java程序开发》课程教学资源(PPT课件讲稿)第11章 Struts2框架技术.ppt
- Software Reliability & Testing(PPT讲稿)Overview of Software Reliability Engineering.ppt
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Chapter 9 High Speed LANs and Wireless LANs.ppt
- 《软件工程》课程教学资源(PPT讲稿)软件测试——系统测试.pptx
- 厦门大学:《大数据技术原理与应用》课程教学资源(PPT课件讲稿,2017)第4章 分布式数据库HBase.ppt
- 上海交通大学:自然语言处理(PPT课件讲稿)Natural Language Processing.ppt
- 演化计算(PPT讲稿)Evolutionary Computation(EC).ppt
- 《计算机组成原理》课程电子教案(PPT课件讲稿)第4章 指令系统.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第五章 运输层.ppt
- C++ Basics(PPT讲稿).ppt
- 河南中医药大学(河南中医学院):《计算机文化》课程教学资源(PPT课件讲稿)第五章 运输层.pptx
- 南京航空航天大学:《数据结构》课程教学资源(PPT课件讲稿)第七章 图(微软精品课程建设).ppt
- 香港浸会大学:Programming Interest Group(PPT讲稿)Combinatorics & Number Theory.ppt
- 河南中医药大学(河南中医学院):《计算机网络》课程教学资源(PPT课件讲稿)第二章 物理层.ppt
- 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 03 The term vocabulary and postings lists.ppt
- A Unified Approach to Route Planning for Shared Mobility.pptx
- 同济大学:《软件测试》课程教学资源(PPT课件讲稿)第6章 功能测试(朱少民).ppt
- 山西农业大学:大数据技术原理与应用(PPT讲稿)Development and application of bigdata technology.ppt
- Peer-to-Peer Networks:Distributed Algorithms for P2P Distributed Hash Tables.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)Chapter 01 量化设计与分析基础(主讲:周学海).ppt
- 《计算机视觉》课程教学资源(PPT课件讲稿)边缘和线特征提取.ppt
- 厦门大学:《数据库系统原理》课程教学资源(PPT课件讲稿,2016版)第五章 数据库完整性.ppt
- 四川大学:《Linux操作系统》课程教学资源(PPT课件讲稿)第2章 Linux操作系统管理基础.ppt
- 《数据结构》课程教学资源(PPT课件讲稿)第六章 树与二叉树(6.1-6.3).ppt
- 《Java语言程序设计》课程教学资源(PPT课件讲稿)第三章 Java面向对象程序设计.ppt
- 香港科技大学:Advanced Topics in Next Generation Wireless Networks.ppt
- 《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源(PPT课件讲稿)Chapter 04 Feature extraction and tracking.pptx
- 面向服务的业务流程管理(PPT讲稿)Introduction to Business Process Management(BPM).pptx
- 《Computer Networking:A Top Down Approach》英文教材教学资源(PPT课件讲稿,6th edition)Chapter 6 无线和移动网络 Wireless and Mobile Networks.ppt
- “互联网+”与“+互联网”(PPT讲稿).pptx
- 《C语言程序设计》课程电子教案(PPT课件讲稿)第六章 函数.ppt
- 南京大学:可信软件(PPT讲稿)认识、度量与评估.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第二章 物理层.ppt
- 中国科学技术大学:《嵌入式系统设计》课程教学资源(PPT课件讲稿)第2章 ARM微处理器概述与编程模型(王行甫).ppt
- 厦门大学:《大数据技术原理与应用》课程教学资源(PPT课件讲稿,2017)第9章 Spark.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第九章 排序.ppt
- PARALLELISM IN HASKELL(Kathleen Fisher).pptx