《网络搜索和挖掘技术》课程教学资源(PPT讲稿)Lecture 1:Web Search Overview & Web Crawling

Web Search Overview Crawling Web Search and Mining Lecture 1: Web Search Overview Web Crawling
Web Search Overview & Crawling 1 Lecture 1: Web Search Overview & Web Crawling Web Search and Mining

Web Search Overview Crawling nigritude ultramarine- Google Search-Mozilla Firefox File Edit View Go Bookmarks Yahoo! Tools Help G http: //www.gogle. com/search?hl-en&q =nigritude +ultramarine8btng=Google +Search Getting Started .Latest Headlines ! Search WebMail M My Yahoo! Games Movies MusicAnswersPersonalsSign In pragh60@gmail. com My Account I Sign out Web Images Groups News Froogle Local more Google igitud thamaine Search Advanoed Search Preferenoe Web Results 1-10 of about 185,000 for nigritude ultramarine. (0.35 seconds) Anil Dash: Nigritude Ultramarine Sponsored Links Do me a favor: Link to this post with the phrase Nigritude Ultramarine... Just placed a link to your Nigritude Ultramarine article on my weblog. Cheers! .. Paid BNiness Blogging Seminar www. www.dashes.com/anil/2004/06/04/nigritude_ultra-10 1, 2006-.com/anil/2004/06/0/nigritudultra-10k-mar1,2006 to L.A. March 16 Cached-Similar pages Search Ads Tdp bloggers reveal key techniques www.blogbusinesssummit.com Nigritude Ultramarine FAQ Los Angeles, CA Nigritude Ultramarine FAQ-frequently asked questions about nigritude ultramarine and the realted SEO contest Full- Time SEO& SEM Jobs www. www.nigritudeultramarines. com/-59k-cached- pages.com-59kcached-similarpages Find companies big small hiring full-time SEO SEM pros right now SEO contest-Wikipedia, the free encyclopedia CareerBuilder.com he nig ison of search results for nigritude ultramarine during and after the .. igritude ultramarine competition by SearchGuild is widely acclaimed as... SEO Contests en. wikipedia. org/wiki/Ni ine-37k-Cached-Similar pages Information on SEO Contests like the Nigritude Ultramarine contest. Slashdot How To Get Googled, By Hook Or By Crook www.seo-contests.com/ he current 3rd re de Ultramarine Fighting Force"who... When discussing nigrit ot. org] it is important to... The SEO Book slashdot.org/articl 40217-110k-Cached- Similar pages Nigritude Ultramarine SEO secrets Fun, free, raw, different. The Nigritude Ultramarine Search Engine Optimization Contest It's sweeping the w at least search engine optimizers-a new contest to rank tops for the term nigritude ultramarine on Google Algorithmic results. searchenginewatch. com/sereport/article. php /3360231-57k-Cached- Similar pages Overstock.com Done
Web Search Overview & Crawling 2 Algorithmic results. Paid Search Ads

Web Search Overview Crawling Search IR Search and Information Retrieval Search on the Web is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search are everywhere The field of computer science that is most involved with&D for search is information retrieval (IR) 3
Web Search Overview & Crawling 3 Search and Information Retrieval ▪ Search on the Web is a daily activity for many people throughout the world ▪ Search and communication are most popular uses of the computer ▪ Applications involving search are everywhere ▪ The field of computer science that is most involved with R&D for search is information retrieval (IR) Search & IR

Web Search Overview Crawling IR Information Retrieval "Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information. "(Salton, 1968) General definition that can be applied to many types of information and search applications Primary focus of ir since the 50s has been on text and documents 4
Web Search Overview & Crawling 4 Information Retrieval ▪ “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968) ▪ General definition that can be applied to many types of information and search applications ▪ Primary focus of IR since the 50s has been on text and documents IR

Web Search Overview Crawling IR What is a Document? Examples: web pages, email, books, news stories scholarly papers, text messages, WordTM, PowerpointTM, PDF, forum postings, patents, etc. Common properties Significant text content Some structure (e.g., title, author, date for papers; subject, sender, destination for email)
Web Search Overview & Crawling 5 What is a Document? ▪ Examples: ▪ web pages, email, books, news stories, scholarly papers, text messages, Word™, Powerpoint™, PDF, forum postings, patents, etc. ▪ Common properties ▪ Significant text content ▪ Some structure (e.g., title, author, date for papers; subject, sender, destination for email) IR

Web Search Overview Crawling IR Documents vs. Database Records Database records (or tuples in relational databases) are typically made up of well-defined fields (or attributes) e.g., bank records with account numbers, balances, names, addresses, social security numbers, dates of birth,etc. Easy to compare fields with well-defined semantics to queries in order to find matches Text is more difficult 6
Web Search Overview & Crawling 6 Documents vs. Database Records ▪ Database records (or tuples in relational databases) are typically made up of well-defined fields (or attributes) ▪ e.g., bank records with account numbers, balances, names, addresses, social security numbers, dates of birth, etc. ▪ Easy to compare fields with well-defined semantics to queries in order to find matches ▪ Text is more difficult IR

Web Search Overview Crawling IR Documents vs. Records Example bank database query Find records with balance >$50,000 in branches located in Amherst, MA. Matches easily found by comparison with field values of records Example search engine query bank scandals in western mass This text must be compared to the text of entire news stories 7
Web Search Overview & Crawling 7 Documents vs. Records ▪ Example bank database query ▪ Find records with balance > $50,000 in branches located in Amherst, MA. ▪ Matches easily found by comparison with field values of records ▪ Example search engine query ▪ bank scandals in western mass ▪ This text must be compared to the text of entire news stories IR

Web Search Overview Crawling IR Comparing Text Comparing the query text to the document text and determining what is a good match is the core issue of information retrieval Exact matching of words is not enough Many different ways to write the same thing in a "natural language" like English e.g., does a news story containing the text "bank director in Amherst steals funds"match the query? Some stories will be better matches than others 8
Web Search Overview & Crawling 8 Comparing Text ▪ Comparing the query text to the document text and determining what is a good match is the core issue of information retrieval ▪ Exact matching of words is not enough ▪ Many different ways to write the same thing in a “natural language” like English ▪ e.g., does a news story containing the text “bank director in Amherst steals funds” match the query? ▪ Some stories will be better matches than others IR

Web Search Overview Crawling IR Dimensions of IR IR is more than just text, and more than just web search although these are central People doing IR work with different media, different types of search applications, and different tasks 9
Web Search Overview & Crawling 9 Dimensions of IR ▪ IR is more than just text, and more than just web search ▪ although these are central ▪ People doing IR work with different media, different types of search applications, and different tasks IR

Web Search Overview Crawling IR Other Media New applications increasingly involve new media e.g., video, photos, music, speech Like text, content is difficult to describe and compare text may be used to represent them (e.g. tags) IR approaches to search and evaluation are appropriate 10
Web Search Overview & Crawling 10 Other Media ▪ New applications increasingly involve new media ▪ e.g., video, photos, music, speech ▪ Like text, content is difficult to describe and compare ▪ text may be used to represent them (e.g. tags) ▪ IR approaches to search and evaluation are appropriate IR
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《程序设计语言》课程PPT教学课件(章节大纲).ppt
- 长春大学旅游学院:《计算机网络与网络安全》课程教学资源(PPT课件)第6章 计算机网络与网络安全.ppt
- JavaScript编程基础(JavaScript语法规则).ppt
- 《面向对象程序设计》课程PPT教学课件:第1章 Visual Basic概述(主讲:高慧).ppt
- 西安电子科技大学:Operating-System Structures(PPT讲稿).pptx
- 电子科技大学计算机学院:《现代密码学》课程PPT教学课件(密码学基础)第一章 引言.ppt
- 山东大学:《微机原理及单片机接口技术》课程教学资源(PPT课件讲稿)第九章 模数转换器与数模转换器.ppt
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Chapter 10 Circuit Switching and Packet Switching.ppt
- 杭州电子科技大学:《计算机、互联网和万维网简介》教学资源(PPT课件)Chapter 01 C++ Programming Basics.ppt
- 《E-commerce 2014》电子商务(PPT讲稿)Chapter 5 E-commerce Security and Payment Systems.ppt
- 《WEB技术开发》教学资源(PPT讲稿)HTML AND CSS.ppt
- 《E-commerce 2014》电子商务(PPT讲稿)Chapter 12 B2B E-commerce:Supply Chain Management and Collaborative Commerce.ppt
- 清华大学出版社:《WEB技术开发》课程教学资源(PPT课件)第1章 WEB开发技术概述.ppt
- 《E-commerce 2014》电子商务(PPT讲稿)Chapter 9 Online Retail and Services.ppt
- 浙江大学:虚拟现实中基于图像的建模和绘制(报告PPT).ppt
- 生物信息数据分析技能培训:计算机基础技能培训(linux基础知识).pptx
- 大型综合程序范例解析(PPT讲稿).ppt
- 结构(9.1 构建手机通讯录 9.2 结构变量 9.3 结构数组 9.4 结构指针).ppt
- 浙江大学计算机系:网络图形技术 Chinagraph‘2000 讨论组.ppt
- 浙江大学:《计算机辅助设计与图形学》课程教学资源(PPT讲稿)基于图像的绘制技术 Image Based Rendering, IBR.ppt
- 《编译原理》课程教学资源(PPT课件讲稿)第四章 语法分析——自上而下分析.ppt
- 赣南师范大学:《计算机网络技术》课程教学资源(PPT课件讲稿)第十章 Internet概述.ppt
- Java面向对象程序设计:Java的接口(PPT讲稿).pptx
- 动态内存分配器的实现(实验PPT讲稿).pptx
- 东南大学:《数据结构》课程教学资源(PPT课件讲稿)随机算法(主讲:方效林).pptx
- 中国科学技术大学:《现代密码学理论与实践》课程教学资源(PPT课件讲稿)第1章 引言(主讲:苗付友).pptx
- 《算法设计与分析 Design and Analysis of Algorithms》课程PPT课件:Tutorial 10.pptx
- 《C程序设计》课程PPT电子教案:第一章 概述.ppt
- 南京大学:《嵌入式网络物理系统》课程教学资源(PPT讲稿)时光自动机 Timed Automata.ppt
- 《PowerPoint》课程PPT教学课件:第六章 使用PowerPoint创建演示文稿.ppt
- 香港科技大学:Web-log Mining:from Pages to Relations.ppt
- 中国科学技术大学计算机学院:《高级操作系统 Advanced Operating System》课程教学资源(PPT课件)第四章 分布式进程和处理机管理(分布式处理机分配算法).ppt
- 清华大学:ICCV 2015 RIDE:Reversal Invariant Descriptor Enhancement.pptx
- 中国人民大学:Similarity Measures in Deep Web Data Integration.ppt
- 《数据结构》课程教学资源:课程PPT教学课件:绪论(数据结构讨论的范畴、基本概念、算法和算法的量度).ppt
- 《计算机组装与维修》课程教学资源(PPT课件讲稿)第二章 计算机系统维护维修工具使用.ppt
- 东南大学计算机学院:《操作系统概念 OPERATING SYSTEM CONCEPTS》课程教学资源(PPT课件)Operating-System Structures.ppt
- 《数字图像处理 Digital Image Processing》课程教学资源(PPT课件讲稿)第2章 图像分析.ppt
- 《EDA技术》实用教程(PPT讲稿)第5章 QuartusII 应用向导.ppt
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Chapter 4 Transmission Media.ppt