中国高校课件下载中心 》 教学资源 》 大学文库

《商务智能:数据分析的管理视角 Business Intelligence, Analytics, and Data Science:A Managerial Perspective》教学资源(PPT课件,第3版)Chapter 05 Text and Web Analytics

《商务智能:数据分析的管理视角 Business Intelligence, Analytics, and Data Science:A Managerial Perspective》教学资源(PPT课件,第3版)Chapter 05 Text and Web Analytics

Business Intelligence: A Managerial Perspective on Analytics(3rd Edition) INTELLIGENCE A Managerial Perspective on Analytics Chapter 5 Text and Web Analytics EFRAUTI RRAN

Chapter 5: Text and Web Analytics Business Intelligence: A Managerial Perspective on Analytics (3rd Edition)

Learning Objectives Describe text mining and understand the need for text mining Differentiate between text mining, Web mining and data mining Understand the different application areas for text mining Know the process of carrying out a text mining project Understand the different methods to introduce structure to text-based data Continued.) Copynight@ 2014 Pearson Education, Inc Slide 5-2

Copyright © 2014 Pearson Education, Inc. Slide 5- 2 Learning Objectives ▪ Describe text mining and understand the need for text mining ▪ Differentiate between text mining, Web mining, and data mining ▪ Understand the different application areas for text mining ▪ Know the process of carrying out a text mining project ▪ Understand the different methods to introduce structure to text-based data (Continued…)

Learning Objectives Describe Web mining, its objectives, and its benefits Understand the three different branches of web mIning Web content mining Web structure mining Web usage mining Understand the applications of these three mining paradigms Copynight@ 2014 Pearson Education, Inc Slide 5-3

Copyright © 2014 Pearson Education, Inc. Slide 5- 3 Learning Objectives ▪ Describe Web mining, its objectives, and its benefits ▪ Understand the three different branches of Web mining ▪ Web content mining ▪ Web structure mining ▪ Web usage mining ▪ Understand the applications of these three mining paradigms

Opening Vignette Machine Versus Men on Jeopardy! The Story of Watson Situation Problem Watch it on YouTube! Solution Results Answer discuss the case questions Copynight@ 2014 Pearson Education, Inc Slide 5-4

Copyright © 2014 Pearson Education, Inc. Slide 5- 4 Opening Vignette… Machine Versus Men on Jeopardy!: The Story of Watson ▪ Situation ▪ Problem ▪ Solution ▪ Results ▪ Answer & discuss the case questions. Watch it on YouTube!

Questions for the Opening Vignette 1. What is Watson? What is special about it? What technologies were used in building Watson(both hardware and software)? 3. What are the innovative characteristics of DeepQA architecture that made Watson superior? 4. Why did IBM spend all that time and money to build Watson? Where is the Rol? Copynight@ 2014 Pearson Education, Inc Slide 5-5

Copyright © 2014 Pearson Education, Inc. Slide 5- 5 Questions for the Opening Vignette 1. What is Watson? What is special about it? 2. What technologies were used in building Watson (both hardware and software)? 3. What are the innovative characteristics of DeepQA architecture that made Watson superior? 4. Why did IBM spend all that time and money to build Watson? Where is the ROI?

A High-Level Depiction of IBM Watsons DeepQA Architecture Answer Evidence sources Candidate P Support Deep search answer ence evidence Question generation retrieval scoring ? models Question Query Hypothesis Soft Hypothesis and d Synthesis Final merging analysIs decomposition generation filtering evidence scoring and ranking Hypothesis Soft Hypothesis and generation filtering evidence scoring Answer and confidence Copynight@ 2014 Pearson Education, Inc Slide 5-6

Copyright © 2014 Pearson Education, Inc. Slide 5- 6 A High-Level Depiction of IBM Watson’s DeepQA Architecture Trained models Question analysis Hypothesis generation Query decomposition Soft filtering Hypothesis and evidence scoring Synthesis Final merging and ranking Answer and confidence ... ... ... Hypothesis generation Soft filtering Hypothesis and evidence scoring Answer sources Evidence sources Primary search Candidate answer generation Support evidence retrieval Deep evidence scoring Question 1 2 3 4 5

Text Mining Concepts 85-90 percent of all corporate data is in some kind of unstructured form(e.g, text) Unstructured corporate data is doubling in size every 18 months Tapping into these information sources is not an option, but a need to stay competitive Answer: text mining A semi-automated process of extracting knowledge from unstructured data sources a.k. a text data mining or knowledge discovery in textual databases Copynight@ 2014 Pearson Education, Inc Slide 5-7

Copyright © 2014 Pearson Education, Inc. Slide 5- 7 Text Mining Concepts ▪ 85-90 percent of all corporate data is in some kind of unstructured form (e.g., text) ▪ Unstructured corporate data is doubling in size every 18 months ▪ Tapping into these information sources is not an option, but a need to stay competitive ▪ Answer: text mining ▪ A semi-automated process of extracting knowledge from unstructured data sources ▪ a.k.a. text data mining or knowledge discovery in textual databases

Data Mining versus Text Mining Both seek for novel and useful patterns Both are semi-automated processes Difference is the nature of the data Structured versus unstructured data Structured data: in databases Unstructured data: Word documents. PDF files, text excerpts, XML files, and so on Text mining-first, impose structure to the data. then mine the structured data Copynight@ 2014 Pearson Education, Inc Slide 5-8

Copyright © 2014 Pearson Education, Inc. Slide 5- 8 Data Mining versus Text Mining ▪ Both seek for novel and useful patterns ▪ Both are semi-automated processes ▪ Difference is the nature of the data: ▪ Structured versus unstructured data ▪ Structured data: in databases ▪ Unstructured data: Word documents, PDF files, text excerpts, XML files, and so on ▪ Text mining – first, impose structure to the data, then mine the structured data

Text Mining Concepts Benefits of text mining are obvious, especially in text-rich data environments e.g., law(court orders), academic research(research articles), finance(quarterly reports, medicine(discharge summaries), biology(molecular interactions), technology (patent files), marketing(customer comments), etc Electronic communication records(e.g, Email) Spam filtering Email prioritization and categorization Automatic response generation Copynight@ 2014 Pearson Education, Inc Slide 5-9

Copyright © 2014 Pearson Education, Inc. Slide 5- 9 Text Mining Concepts ▪ Benefits of text mining are obvious, especially in text-rich data environments ▪ e.g., law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), marketing (customer comments), etc. ▪ Electronic communication records (e.g., Email) ▪ Spam filtering ▪ Email prioritization and categorization ▪ Automatic response generation

Text Mining Application Area Information extraction Topic tracking Summarization Categorization Clustering Concept linking Question answering Copynight@ 2014 Pearson Education, Inc Slide 5-10

Copyright © 2014 Pearson Education, Inc. Slide 5- 10 Text Mining Application Area ▪ Information extraction ▪ Topic tracking ▪ Summarization ▪ Categorization ▪ Clustering ▪ Concept linking ▪ Question answering
