中国高校课件下载中心 》 教学资源 》 大学文库

《网络搜索和挖掘技术》课程教学资源(PPT讲稿)Lecture 1:Web Search Overview & Web Crawling

《网络搜索和挖掘技术》课程教学资源(PPT讲稿)Lecture 1:Web Search Overview & Web Crawling

Web Search Overview Crawling Web Search and Mining Lecture 1: Web Search Overview Web Crawling

Web Search Overview & Crawling 1 Lecture 1: Web Search Overview & Web Crawling Web Search and Mining

Web Search Overview Crawling nigritude ultramarine- Google Search-Mozilla Firefox File Edit View Go Bookmarks Yahoo! Tools Help G http: //www.gogle. com/search?hl-en&q =nigritude +ultramarine8btng=Google +Search Getting Started .Latest Headlines ! Search WebMail M My Yahoo! Games Movies MusicAnswersPersonalsSign In pragh60@gmail. com My Account I Sign out Web Images Groups News Froogle Local more Google igitud thamaine Search Advanoed Search Preferenoe Web Results 1-10 of about 185,000 for nigritude ultramarine. (0.35 seconds) Anil Dash: Nigritude Ultramarine Sponsored Links Do me a favor: Link to this post with the phrase Nigritude Ultramarine... Just placed a link to your Nigritude Ultramarine article on my weblog. Cheers! .. Paid BNiness Blogging Seminar www. 1,,2006 to L.A. March 16 Cached-Similar pages Search Ads Tdp bloggers reveal key techniques Nigritude Ultramarine FAQ Los Angeles, CA Nigritude Ultramarine FAQ-frequently asked questions about nigritude ultramarine and the realted SEO contest Full- Time SEO& SEM Jobs www. www.nigritudeultramarines. com/-59k-cached- Find companies big small hiring full-time SEO SEM pros right now SEO contest-Wikipedia, the free encyclopedia he nig ison of search results for nigritude ultramarine during and after the .. igritude ultramarine competition by SearchGuild is widely acclaimed as... SEO Contests en. wikipedia. org/wiki/Ni ine-37k-Cached-Similar pages Information on SEO Contests like the Nigritude Ultramarine contest. Slashdot How To Get Googled, By Hook Or By Crook he current 3rd re de Ultramarine Fighting Force"who... When discussing nigrit ot. org] it is important to... The SEO Book 40217-110k-Cached- Similar pages Nigritude Ultramarine SEO secrets Fun, free, raw, different. The Nigritude Ultramarine Search Engine Optimization Contest It's sweeping the w at least search engine optimizers-a new contest to rank tops for the term nigritude ultramarine on Google Algorithmic results. searchenginewatch. com/sereport/article. php /3360231-57k-Cached- Similar pages Done

Web Search Overview & Crawling 2 Algorithmic results. Paid Search Ads

Web Search Overview Crawling Search IR Search and Information Retrieval Search on the Web is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search are everywhere The field of computer science that is most involved with&D for search is information retrieval (IR) 3

Web Search Overview & Crawling 3 Search and Information Retrieval ▪ Search on the Web is a daily activity for many people throughout the world ▪ Search and communication are most popular uses of the computer ▪ Applications involving search are everywhere ▪ The field of computer science that is most involved with R&D for search is information retrieval (IR) Search & IR

Web Search Overview Crawling IR Information Retrieval "Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information. "(Salton, 1968) General definition that can be applied to many types of information and search applications Primary focus of ir since the 50s has been on text and documents 4

Web Search Overview & Crawling 4 Information Retrieval ▪ “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968) ▪ General definition that can be applied to many types of information and search applications ▪ Primary focus of IR since the 50s has been on text and documents IR

Web Search Overview Crawling IR What is a Document? Examples: web pages, email, books, news stories scholarly papers, text messages, WordTM, PowerpointTM, PDF, forum postings, patents, etc. Common properties Significant text content Some structure (e.g., title, author, date for papers; subject, sender, destination for email)

Web Search Overview & Crawling 5 What is a Document? ▪ Examples: ▪ web pages, email, books, news stories, scholarly papers, text messages, Word™, Powerpoint™, PDF, forum postings, patents, etc. ▪ Common properties ▪ Significant text content ▪ Some structure (e.g., title, author, date for papers; subject, sender, destination for email) IR

Web Search Overview Crawling IR Documents vs. Database Records Database records (or tuples in relational databases) are typically made up of well-defined fields (or attributes) e.g., bank records with account numbers, balances, names, addresses, social security numbers, dates of birth,etc. Easy to compare fields with well-defined semantics to queries in order to find matches Text is more difficult 6

Web Search Overview & Crawling 6 Documents vs. Database Records ▪ Database records (or tuples in relational databases) are typically made up of well-defined fields (or attributes) ▪ e.g., bank records with account numbers, balances, names, addresses, social security numbers, dates of birth, etc. ▪ Easy to compare fields with well-defined semantics to queries in order to find matches ▪ Text is more difficult IR

Web Search Overview Crawling IR Documents vs. Records Example bank database query Find records with balance >$50,000 in branches located in Amherst, MA. Matches easily found by comparison with field values of records Example search engine query bank scandals in western mass This text must be compared to the text of entire news stories 7

Web Search Overview & Crawling 7 Documents vs. Records ▪ Example bank database query ▪ Find records with balance > $50,000 in branches located in Amherst, MA. ▪ Matches easily found by comparison with field values of records ▪ Example search engine query ▪ bank scandals in western mass ▪ This text must be compared to the text of entire news stories IR

Web Search Overview Crawling IR Comparing Text Comparing the query text to the document text and determining what is a good match is the core issue of information retrieval Exact matching of words is not enough Many different ways to write the same thing in a "natural language" like English e.g., does a news story containing the text "bank director in Amherst steals funds"match the query? Some stories will be better matches than others 8

Web Search Overview & Crawling 8 Comparing Text ▪ Comparing the query text to the document text and determining what is a good match is the core issue of information retrieval ▪ Exact matching of words is not enough ▪ Many different ways to write the same thing in a “natural language” like English ▪ e.g., does a news story containing the text “bank director in Amherst steals funds” match the query? ▪ Some stories will be better matches than others IR

Web Search Overview Crawling IR Dimensions of IR IR is more than just text, and more than just web search although these are central People doing IR work with different media, different types of search applications, and different tasks 9

Web Search Overview & Crawling 9 Dimensions of IR ▪ IR is more than just text, and more than just web search ▪ although these are central ▪ People doing IR work with different media, different types of search applications, and different tasks IR

Web Search Overview Crawling IR Other Media New applications increasingly involve new media e.g., video, photos, music, speech Like text, content is difficult to describe and compare text may be used to represent them (e.g. tags) IR approaches to search and evaluation are appropriate 10

Web Search Overview & Crawling 10 Other Media ▪ New applications increasingly involve new media ▪ e.g., video, photos, music, speech ▪ Like text, content is difficult to describe and compare ▪ text may be used to represent them (e.g. tags) ▪ IR approaches to search and evaluation are appropriate IR
