《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 11 Probabilistic Information Retrieval

Probabilistic Information Retrieval Web Search and Mining Lecture 11: Probabilistic Information Retrieval
Probabilistic Information Retrieval 1 Lecture 11: Probabilistic Information Retrieval Web Search and Mining

Probabilistic Information Retrieval Recap of the last lecture Improving search results Especially for high recall. E. g searching for aircraft so it matches with plane; thermodynamic with heat Options for improving results Global methods Query expansion Thesauri Automatic thesaurus generation Global indirect relevance feedback ■Loca| methods Relevance feedback Pseudo relevance feedback
Probabilistic Information Retrieval 2 Recap of the last lecture ▪ Improving search results ▪ Especially for high recall. E.g., searching for aircraft so it matches with plane; thermodynamic with heat ▪ Options for improving results… ▪ Global methods ▪ Query expansion ▪ Thesauri ▪ Automatic thesaurus generation ▪ Global indirect relevance feedback ▪ Local methods ▪ Relevance feedback ▪ Pseudo relevance feedback

Probabilistic Information Retrieval Probabilistic relevance feedback Rather than reweighting in a vector space If user has told us some relevant and some irrelevant documents then we can proceed to build a probabilistic classifier, such as a Naive bayes model P(tkIR)=Drk/D P(tkINr) =Dnrk I /Dnr tk is a term; D is the set of known relevant documents Drk is the subset that contain ti D. is the set of known irrelevant documents D is the subset that contain t
Probabilistic Information Retrieval 3 Probabilistic relevance feedback ▪ Rather than reweighting in a vector space… ▪ If user has told us some relevant and some irrelevant documents, then we can proceed to build a probabilistic classifier, such as a Naive Bayes model: ▪ P(tk|R) = |Drk| / |Dr| ▪ P(tk|NR) = |Dnrk| / |Dnr| ▪ tk is a term; Dr is the set of known relevant documents; Drk is the subset that contain tk ; Dnr is the set of known irrelevant documents; Dnrk is the subset that contain tk

Probabilistic Information Retrieval Why probabilities in IR? User Understanding Query Information Need Representation of user need is un certain How to match? Uncertain guess of Document Documents whether docum ent Representation has relevant content In traditional IR systems, matching between each document and query is attempted in a semantically imprecise space of index terms Probabilities provide a princi pled foundation for un certain reasoning Can we use probabilities to guantify our uncertainties?
Probabilistic Information Retrieval 4 Why probabilities in IR? User Information Need Documents Document Representation Query Representation How to match? In traditional IR systems, matching between each document and query is attempted in a semantically imprecise space of index terms. Probabilities provide a principled foundation for uncertain reasoning. Can we use probabilities to quantify our uncertainties? Uncertain guess of whether document has relevant content Understanding of user need is uncertain

Probabilistic Information Retrieval Probabilistic IR topics Classical probabilistic retrieval model Probability ranking principle, etc Binary independence model Bayesian networks for text retrieval Language model approach to IR An important emphasis in recent work Probabilistic methods are one of the oldest but also one of the currently hottest topics in /R Traditionally: neat ideas, but they ve never won on performance. It may be different now
Probabilistic Information Retrieval 5 Probabilistic IR topics ▪ Classical probabilistic retrieval model ▪ Probability ranking principle, etc. ▪ Binary independence model ▪ Bayesian networks for text retrieval ▪ Language model approach to IR ▪ An important emphasis in recent work ▪ Probabilistic methods are one of the oldest but also one of the currently hottest topics in IR. ▪ Traditionally: neat ideas, but they’ve never won on performance. It may be different now

Probabilistic Information Retrieval The document ranking problem We have a collection of documents User issues a query a list of documents needs to be returned Ranking method is core of an IR system In what order do we present documents to the user? We want the best document to be first second best second, etc Idea: Rank by probability of relevance of the document w.r.t information need P(relevant document, query
Probabilistic Information Retrieval 6 The document ranking problem ▪ We have a collection of documents ▪ User issues a query ▪ A list of documents needs to be returned ▪ Ranking method is core of an IR system: ▪ In what order do we present documents to the user? ▪ We want the “best” document to be first, second best second, etc…. ▪ Idea: Rank by probability of relevance of the document w.r.t. information need ▪ P(relevant|documenti , query)

Probabilistic Information Retrieval Probability Basics Recall a few probability basics or events a and b Bayes Rule p(a,b)=p(anb)=p(a bp(b)=p((a p(a bp(b)=p(b ap(ay P(alb)=p(bla)p(a=sp(blapla prior p(b)∑ pbx(x) Posterion x=aa Odds: O(a)=p(a)
Probabilistic Information Retrieval 7 Recall a few probability basics ▪ For events a and b: ▪ Bayes’ Rule ▪ Odds: = = = = = = = x a a p b x p x p b a p a p b p b a p a p a b p a b p b p b a p a p a b p a b p a b p b p b a p a , ( | ) ( ) ( | ) ( ) ( ) ( | ) ( ) ( | ) ( | ) ( ) ( | ) ( ) ( , ) ( ) ( | ) ( ) ( | ) ( ) 1 ( ) ( ) ( ) ( ) ( ) p a p a p a p a O a − = = Posterior Prior Probability Basics

Probabilistic Information Retrieval Probability Ranking Principle The probability ranking principle If a reference retrieval system s response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data [1960s/1970s]S Robertson, W.S. Cooper, M.E. Maron van Rijsbergen (1979: 113 ); Manning Schutze(1999: 538)
Probabilistic Information Retrieval 8 The Probability Ranking Principle “If a reference retrieval system's response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data have been made available to the system for this purpose, the overall effectiveness of the system to its user will be the best that is obtainable on the basis of those data.” ▪ [1960s/1970s] S. Robertson, W.S. Cooper, M.E. Maron; van Rijsbergen (1979:113); Manning & Schütze (1999:538) Probability Ranking Principle

Probabilistic Information Retrieval Probability Ranking Principle Probability ranking principle Let x be a document in the collection Let r represent relevance of a document w r t. a given(fixed) query and let Nr represent non-relevance. R=0, 1)vS NR/R Need to find p(rx)-probability that a document x is relevant. P(RIx)=p(r RP(R) p(R, P(NR)-prior probability p(x) of retrieving a(non) relevant document P(NRLx-p(xINRP(nr p(r x)+p(nr x)= p(xR),p(x NR)-probability that if a relevant(non-relevant) document is retrieved it is x
Probabilistic Information Retrieval 9 Probability Ranking Principle Let x be a document in the collection. Let R represent relevance of a document w.r.t. a given (fixed) query and let NR represent non-relevance. ( ) ( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) p x p x NR p NR p NR x p x p x R p R p R x = = p(x|R), p(x|NR) - probability that if a relevant (non-relevant) document is retrieved, it is x. Need to find p(R|x) - probability that a document x is relevant. p(R),p(NR) - prior probability of retrieving a (non) relevant document p(R | x) + p(NR | x) =1 R={0,1} vs. NR/R Probability Ranking Principle

Probabilistic Information Retrieval Probability Ranking Principle Probability ranking principle(Prp Simple case: no selection costs or other utility concerns that would differentially weight errors BayesOptimal Decision Rule x is relevant iff p(rx)>p(nrx PRP in action: Rank all documents by p(rix Theorem Using the prp is optimal, in that it minimizes the loss (Bayes risk under 1/0 loss Provable if all probabilities correct, etc. [e. g ripley 1996]
Probabilistic Information Retrieval 10 Probability Ranking Principle (PRP) ▪ Simple case: no selection costs or other utility concerns that would differentially weight errors ▪ Bayes’ Optimal Decision Rule ▪ x is relevant iff p(R|x) > p(NR|x) ▪ PRP in action: Rank all documents by p(R|x) ▪ Theorem: ▪ Using the PRP is optimal, in that it minimizes the loss (Bayes risk) under 1/0 loss ▪ Provable if all probabilities correct, etc. [e.g., Ripley 1996] Probability Ranking Principle
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 山东大学:《人机交互技术》课程教学资源(PPT课件讲稿)第3章 交互设备 3.5 显示设备 3.6 语音交互设备 3.7虚拟现实系统中的交互设备.ppt
- 东北大学:《可信计算基础》课程教学资源(PPT课件讲稿)第6章 TPM核心功能(主讲:周福才).pptx
- 媒体服务(PPT课件讲稿)Media Services.ppt
- 河南中医药大学(河南中医学院):《计算机网络》课程教学资源(PPT课件讲稿)第六章 应用层.pptx
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)第6章 Data-Level Parallelism in Vector, SIMD, and GPU Architectures.ppt
- 南京大学:《编译原理》课程教学资源(PPT课件讲稿)第七章 运行时刻环境.ppt
- 《高级人工智能 Advanced Artificial Intelligence》教学资源(PPT讲稿)Lecture 7 Recurrent Neural Network.pptx
- 西安交通大学:《网络与信息安全》课程PPT教学课件(网络入侵与防范)第六章 网络入侵与防范——拒绝服务攻击与防御技术.ppt
- 西安电子科技大学:《计算机通信网》课程教学资源(PPT课件讲稿)第1章 概述(宋锐).ppt
- 中国科学技术大学:《嵌入式操作系统 Embedded Operating Systems》课程教学资源(PPT课件讲稿)第四讲 CPU调度(part II).ppt
- 大数据集成(PPT讲稿)Big Data Integration.pptx
- 《计算机文化基础》课程教学资源(PPT课件讲稿)第七章 计算机网络基础.ppt
- 《计算机应用基础》课程教学资源(PPT课件讲稿)第四章 电子表格软件(Excel 2003).ppt
- 四川大学:《操作系统 Operating System》课程教学资源(PPT课件讲稿)Chapter 3 Process Description and Control 3.1 What is a Process 3.2 Process States 3.3 Process Description.ppt
- 哈尔滨工业大学:《语言信息处理》课程教学资源(PPT课件讲稿)机器翻译 II Machine Translation II.ppt
- Gas Systems Modeling andSimulation with MSC.EASY5:GD Advanced Class Notes(EAS105 Course Notes).ppt
- 《计算机网络 Computer Networking》课程教学资源(PPT课件讲稿,英文版)Chapter 6 Wireless and Mobile Networks.ppt
- 《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源(PPT课件讲稿)Chapter 08 Stereo vision.pptx
- 《计算机文化基础》课程教学大纲 Computer Culture Foundation.pdf
- 《高级语言程序设计》课程教学资源(试卷习题)试题五(无答案).doc
- 广西医科大学:《计算机网络 Computer Networking》课程教学资源(PPT课件讲稿)Chapter 01 Introduction overview.pptx
- 东南大学:《C++语言程序设计》课程教学资源(PPT课件讲稿)Chapter 10 Classes A Deeper Look(Part 2).ppt
- 《网上开店实务》课程教学资源(PPT讲稿)学习情境1 网上开店创业策划.ppt
- 安徽理工大学:《Linux开发基础 Development Foundation on Linux OS》课程资源(PPT课件讲稿)Section 4 Perl programming(赵宝).ppt
- 香港理工大学:Artificial Neural Networks for Data Mining.ppt
- 《TCP/IP协议及其应用》课程教学资源(PPT课件)第1章 TCP/IP协议基础.ppt
- 清华大学:《高级计算机网络 Advanced Computer Network》课程教学资源(PPT课件讲稿)Lecture 1 Introduction.pptx
- 香港浸会大学:C++ as a Better C; Introducing Object Technology.ppt
- 大庆职业学院:《计算机网络技术基础》课程教学资源(PPT课件讲稿)第2章 数据通信的基础知识.ppt
- The Art of Function Design -Measure and RKHS.ppt
- 《计算机网络与因特网》课程教学资源(PPT课件)Part VII 广域网(简称WAN), 路由, 和最短路径.ppt
- 三维计算机视觉 3D computer vision(基于卡尔曼滤波的运动结构).pptx
- 河南中医药大学(河南中医学院):《计算机文化》课程教学资源(PPT课件讲稿)第七章 数据库技术(主讲:王哲).pptx
- 《单片机原理及应用》课程教学资源(PPT课件讲稿)第14章 单片机应用系统抗干扰与可靠性设计.ppt
- 北京航空航天大学:《数据挖掘——概念和技术(Data Mining - Concepts and Techniques)》课程教学资源(PPT课件讲稿)Chapter 01 Introduction.ppt
- 《单片机应用系统设计技术》课程教学资源(PPT课件讲稿)第7章 单片机外部扩展资源及应用.ppt
- 香港浸会大学:MPI - Communicators(PPT讲稿).ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第九章 无线网络.ppt
- 中国铁道出版社:《局域网技术与组网工程》课程教学资源(PPT课件讲稿)第2章 网络工程系统.ppt
- 自动语音识别(PPT讲稿)Automatic Speaker Recognition.pptx