《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 08 Scoring and results assembly

Computing Scores in a Complete Search System Web Search and Mining Lecture 8: Scoring and results assembly
Computing Scores in a Complete Search System Lecture 8: Scoring and results assembly Web Search and Mining 1

Computing Scores in a Complete Search System Recap: tf-idf weighting The tf-idf weight of a term is the product of its tf weight and its idf weight W,=(1+log tf, d)logo(N/df, Best known weighting scheme in information retrieval Increases with the number of occurrences within a document a Increases with the rarity of the term in the collection
Computing Scores in a Complete Search System Recap: tf-idf weighting ▪ The tf-idf weight of a term is the product of its tf weight and its idf weight. ▪ Best known weighting scheme in information retrieval ▪ Increases with the number of occurrences within a document ▪ Increases with the rarity of the term in the collection w (1 log tf ) log ( /df ) , t,d 10 N t t d = + 2

Computing Scores in a Complete Search System Recap: Queries as vectors Key idea 1: Do the same for queries: represent them as vectors in the space Key idea 2: Rank documents according to their proximity to the query in this space proximity similarity of vectors
Computing Scores in a Complete Search System Recap: Queries as vectors ▪ Key idea 1: Do the same for queries: represent them as vectors in the space ▪ Key idea 2: Rank documents according to their proximity to the query in this space ▪ proximity = similarity of vectors 3

Computing Scores in a Complete Search System Recap: cosine(query, document Dot product Unit vectors cos(g, d) ∑ cos(a, d) is the cosine similarity of g and d...on equivalently, the cosine of the angle between g and d
Computing Scores in a Complete Search System Recap: cosine(query,document) = = = = • = • = V i i V i i V i i i q d q d d d q q q d q d q d 1 2 1 2 1 cos( , ) Dot product Unit vectors cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. 4

Computing Scores in a Complete Search System This lecture Speeding up vector space ranking Putting together a complete search system Will require learning about a number of miscellaneous topics and heuristics
Computing Scores in a Complete Search System This lecture ▪ Speeding up vector space ranking ▪ Putting together a complete search system ▪ Will require learning about a number of miscellaneous topics and heuristics 5

Computing Scores in a Complete Search System Efficient Ranking Computing cosine scores CosineSCORE(q float Scores[N=0 2 float Length[N] 3 for each query term t 4 do calculate Wt g and fetch postings list for t or each pair(d, tft. d )in postings list lo Scores[d+=Wd×Wtq 7 Read the array Length 8 for each d 9 do Scores d= Scores[d/Length[d 10 return Top K components of Scores
Computing Scores in a Complete Search System Computing cosine scores Efficient Ranking 6

Computing Scores in a Complete Search System Efficient Ranking Efficient cosine rankin Find the k docs in the collection "nearest to the query =K largest query-doc cosines ■ Efficient ran king: Computing a single cosine efficiently Choosing the k largest cosine values efficiently Can we do this without computing all n cosines?
Computing Scores in a Complete Search System Efficient cosine ranking ▪ Find the K docs in the collection “nearest” to the query K largest query-doc cosines. ▪ Efficient ranking: ▪ Computing a single cosine efficiently. ▪ Choosing the K largest cosine values efficiently. ▪ Can we do this without computing all N cosines? Efficient Ranking 7

Computing Scores in a Complete Search System Efficient Ranking Efficient cosine rankin What we re doing in effect solving the k-nearest neighbor problem for a query vector In general, we do not know how to do this efficiently for high-dimensional spaces But it is solvable for short queries and standard indexes support this well
Computing Scores in a Complete Search System Efficient cosine ranking ▪ What we’re doing in effect: solving the K-nearest neighbor problem for a query vector ▪ In general, we do not know how to do this efficiently for high-dimensional spaces ▪ But it is solvable for short queries, and standard indexes support this well Efficient Ranking 8

Computing Scores in a Complete Search System Efficient Ranking Special case -unweighted queries No weighting on query terms Assume each query term occurs only once Then for ranking, dont need to normalize query vector Slight simplification of agorithm from Lecture 7
Computing Scores in a Complete Search System Special case – unweighted queries ▪ No weighting on query terms ▪ Assume each query term occurs only once ▪ Then for ranking, don’t need to normalize query vector ▪ Slight simplification of algorithm from Lecture 7 Efficient Ranking 9

Computing Scores in a Complete Search System Efficient Ranking Faster cosine: unweighted query FASTCOSINESCORE(Q 1 float Scores[N=0 2 for each d 3 do Initialize Length[d] to the length of doc d 4 for each query term t 5 do calculate wtg and fetch postings list for t 6 for each pair(d, tft, d )in postings list 7 do add wf, d to Scores[d] 8 Read the array Length[dI g for each d 10 do Divide Scoresld] by Length[dI 11 return Top K components of Scoresll Figure 7.1 A faster algorithm for vector space scores
Computing Scores in a Complete Search System Faster cosine: unweighted query Efficient Ranking 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 上海海事大学:《数字图像处理》课程教学资源(PPT课件讲稿)Unit 7 Introduction to Digital Image Processing.ppt
- Performance Evaluation of Long Range Dependent Queues(PPT讲稿).pptx
- 《C语言程序设计》课程电子教案(PPT课件讲稿)第二章 基本数据类型及运算.ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)模式&框架 Pattern & Framework.ppt
- 《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第二讲 关系数据库.ppt
- 《计算机辅助设计》课程介绍.pdf
- 沈阳工程学院:《面向对象程序设计》课程教学大纲(适用专业:计算机科学与技术专业).pdf
- 《编译原理》课程教学资源(PPT课件讲稿)从正则表达式到有限自动机.pptx
- Introduction to Computing Using Java(PPT讲稿)Java Language Basics.ppt
- 《物联网导论》课程教学资源(PPT课件讲稿)第2章 自动识别技术与RFID.ppt
- 《计算机维修》课程教学资源(PPT课件讲稿)第3章 磁盘工具.ppt
- 《数据结构》课程PPT教学课件(讲稿)第一章 数据结构基础.ppsx
- 华北科技学院:图像的采集与处理(PPT课件讲稿)Photoshop CS.ppt
- 《JAVA与面向对象编程》课程教学资源(PPT课件讲稿)第二章 Java语法基础.ppt
- 《C语言程序设计》课程电子教案(PPT教学课件)第四章 选择结构程序设计.ppt
- 西安交通大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第7章 模拟量输入输出接口.ppt
- Wrapper Generation and HTML Reduction(PPT讲稿).ppt
- 《微机原理》课程教学资源(PPT课件讲稿)第九章 可编程接口芯片及其与CPU的接口.ppt
- 面向服务的业务流程管理(PPT讲稿)Business Process Modeling Notation(BPMN), Business Process Executive Language(BPEL), and XML Process Definition Language(XPDL).pptx
- 上海交通大学:《微机原理与接口技术》课程教学资源(教学大纲)信息与计算科学专业.pdf
- 《数据库基础》课程教学资源(PPT课件讲稿)第四章 数据查询.ppt
- 北京大学:C++模板与STL库介绍(PPT讲稿).ppt
- Computer Graphics(PPT讲稿)INFORMATION VISUALIZATION.pptx
- 档案数字化基本程序与要求(PPT讲稿).ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)第5章 指令级并行.pptx
- 上海交通大学:《程序设计》课程教学资源(PPT课件讲稿)第14章 输入输出与文件.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)第7章 多处理器及线程级并行.ppt
- 南京大学:《编译原理》课程教学资源(PPT课件讲稿)第五章 语法制导的翻译.ppt
- 河南中医药大学:《网络技术实训》课程教学资源(PPT课件讲稿)第一阶段 组网(主讲:路景鑫).pptx
- 《SQL基础教程》课程教学资源(PPT课件讲稿)第6章 数据操作与SQL语句.ppt
- 《计算机基础及C语言程序设计》课程PPT教学课件(讲稿)第1章 概论.ppt
- 西安交通大学:《网络与信息安全》课程PPT教学课件(网络入侵与防范)身份认证.ppt
- 《计算机网络和因特网》教学资源(PPT讲稿)网络互连(概念, IP 地址, IP 路由, IP 数据报, 地址解析).ppt
- 《高级语言程序设计》课程教学资源(试卷习题)试题四(无答案).doc
- 上海交通厌:《通信网络》课程教学资源(PPT讲稿)DELAY MODELS IN DATA NETWORKS、LITTLE’S LAW、ARRIVAL MODEL、M/M/X QUEUING MODELS.pptx
- 《软件工程》课程教学资源(PPT课件讲稿)第7章 软件测试.ppt
- 《计算机网络安全》课程教学资源(PPT课件讲稿)第二章 密码学技术.ppt
- 《编译原理》课程教学资源(PPT课件讲稿)语法分析 Syntax analysis(自底向上分析 Bottom-Up Parsing).ppt
- 中国人民大学:《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第一章 绪论.ppt
- 《计算机组成原理》课程教学资源(PPT课件讲稿)第四章 存储器.ppt