《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 12 Language Models

Language Models Web Search and Mining ecture 12: Language Models
Language Models 1 Lecture 12: Language Models Web Search and Mining

Language Models Recap Probabilistic models Binary independence model Bayesian networks for IR Language models
Language Models 2 Recap ▪ Probabilistic models: ▪ Binary independence model ▪ Bayesian networks for IR ▪ Language models

Language Models This lecture The Language model approach to Ir Basic query generation model Alternative models
Language Models 3 This lecture ▪ The Language Model Approach to IR ▪ Basic query generation model ▪ Alternative models

Language Models Standard probabilistic ir Information need P(r O, d) d1 matching d2 query dn document collection
Language Models 4 Standard Probabilistic IR query d1 d2 dn … Information need document collection matching P(R | Q,d)

Language Models iR based on Language Model (Lm) Information …………… need P(2IM) Mdi generation d2 query a common search heuristic is to use Mdn words that you expect to find in matching documents as your query document collection The LM approach directly exploits that idea
Language Models 5 IR based on Language Model (LM) query d1 d2 dn … Information need document collection generation ( | ) P Q Md 1 Md M d2 … n M d ▪ A common search heuristic is to use words that you expect to find in matching documents as your query. ▪ The LM approach directly exploits that idea!

Language Models Formal Language(model) Traditional generative model: generates strings Finite state machines or regular grammars, etc Example: I wish I wish i wish I wish i wish i wish wish I wish i wish i wish i wish *wish i wish
Language Models 6 Formal Language (Model) ▪ Traditional generative model: generates strings ▪ Finite state machines or regular grammars, etc. ▪ Example: I wish I wish I wish I wish I wish I wish I wish I wish I wish I wish I wish … *wish I wish

Language Models Stochastic Language Models Models probability of generating strings in the language(commonly all strings over alphabet 2) Model m 0.2 the the man likes the woman 0.1 0.01 man 0.20.010.020.20.01 0.01 woman 0.03 said multiply 0.02 likes P(S|M)=0.0000008
Language Models 7 Stochastic Language Models ▪ Models probability of generating strings in the language (commonly all strings over alphabet ∑) 0.2 the 0.1 a 0.01 man 0.01 woman 0.03 said 0.02 likes … the man likes the woman 0.2 0.01 0.02 0.2 0.01 multiply Model M P(s | M) = 0.00000008

Language Models Stochastic Language Models Model probability of generating any string Model mi Model m2 0.2 the 0.2th 0.0001cass the class pleaseth yon maiden 0.01 class 0.0001 sayst 0.03 sayst 0.2 0.010.00010.00010.0005 0.001 oleasethll0.02 pleaseth)0200100201001 0.0001yon yon 0.0005 m aiden 0.01 m aiden (SM2)> P(SM1) 0.01 wom an 0.0001 woman 8
Language Models 8 Stochastic Language Models ▪ Model probability of generating any string 0.2 the 0.01 class 0.0001 sayst 0.0001 pleaseth 0.0001 yon 0.0005 maiden 0.01 woman Model M1 Model M2 the class pleaseth yon maiden 0.2 0.01 0.0001 0.0001 0.0005 0.2 0.0001 0.02 0.1 0.01 P(s|M2) > P(s|M1) 0.2 the 0.0001 class 0.03 sayst 0.02 pleaseth 0.1 yon 0.01 maiden 0.0001 woman

Language Models Stochastic Language Models A statistical model for generating text Probability distribution over strings in a given language → P(o●oM)=P(。|M (o|M,● P(。|M,●d P(。|M,●o
Language Models 9 Stochastic Language Models ▪ A statistical model for generating text ▪ Probability distribution over strings in a given language M P ( | M ) = P ( | M ) P ( | M, ) P ( | M, ) P ( | M, )

Language Models Unigram and higher-order models ○0●● P()P(o|P(●oP(。l●o刂 Unigram Language models Easy P()P(o)P(●)P( Effective Bigram( generally n-gram) Language models P(°)P(ol●)P(olo)P(●●) Other Language models Grammar-based models(PCFGs),etc Probably not the first thing to try in ir
Language Models 10 Unigram and higher-order models ▪ Unigram Language Models ▪ Bigram (generally, n-gram) Language Models ▪ Other Language Models ▪ Grammar-based models (PCFGs), etc. ▪ Probably not the first thing to try in IR = P ( ) P ( | ) P ( | ) P ( | ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( ) P ( | ) P ( | ) P ( | ) Easy. Effective!
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 四川大学:《操作系统 Operating System》课程教学资源(PPT课件讲稿)Chapter 6 Concurrency - Deadlock(死锁)and Starvation(饥饿).ppt
- 《操作系统》课程教学资源(PPT课件讲稿)实时调度 Real-Time Scheduling.ppt
- 白城师范学院:《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第二章 关系数据库(2.1-2.3).ppt
- 《计算机算法设计与分析》课程教学资源(PPT课件)第8章回溯法.ppt
- 清华大学出版社:《计算机应用基础实例教程》课程教学资源(PPT课件讲稿,第二版,共七章,主编:吴霞,制作:李晓新).ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)绪论、第1章 量化设计与分析基础(主讲:周学海).ppt
- 北京大学:烟花算法的变异算子(PPT讲稿)Mutation Operators of Fireworks Algorithm.pptx
- Introduction to Text Mining 文本挖掘.pptx
- 《Managing XML and Semistructured Data》教学资源(PPT课件讲稿)Part 04 Compressing XML Data.ppt
- 《JAVA面向对象入门技术》教程教学资源(PPT课件讲稿)第二章 Java语言基础.ppt
- 北京大学:《项目成本管理》课程教学资源(PPT课件讲稿)项目范围计划(主讲:周立新).ppt
- 山东大学:《网站设计与建设》课程教学资源(PPT课件讲稿)第三部分 网站设计技术 第20章 MySQL数据库.ppt
- 程序设计工具(PPT课件讲稿)Software Program Tool.ppt
- 《Java Web应用开发技术与案例教程》教学资源(PPT讲稿)第7章 Java Web常用开发模式与案例.ppt
- 《面向对象程序设计》课程教学大纲(适用专业:信息与计算科学).pdf
- 《编译技术》课程教学资源(PPT课件讲稿)第六章 运行时存储空间的组织和管理.ppt
- 沈阳理工大学:《计算机网络》课程教学资源(PPT课件讲稿)第2章 IP技术.ppt
- 香港科技大学:Record Linkage for Big Data.pptx
- 中国科技大学计算机系:《黑客反向工程》课程教学资源(PPT课件讲稿)黑客反向工程导论(陈凯明).ppt
- 《单片机应用技术》课程PPT教学课件(C语言版)第10章 单片机测控接口.ppt
- Progress of Concurrent Objects with Partial Methods.pptx
- 《编译原理与技术》课程教学资源(PPT课件讲稿)代码优化.ppt
- 《单片机应用技术》课程PPT教学课件(C语言版)第3章 MCS-51指令系统及汇编程序设计.ppt
- 《数据结构》课程教学资源(PPT课件讲稿)第八章 图.ppt
- 同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Platforms for Big Data Mining(主讲:饶卫雄).ppt
- 《计算机网络》课程教学资源(PPT讲稿)网络安全(访问控制、加密、防火墙).ppt
- 水平集方法与图像分割 Level set method and image segmentation.pptx
- 北京师范大学:《计算机文化基础》课程教学资源(PPT课件讲稿)08 网页制作基础知识(赵国庆).ppt
- 《C语言程序设计》课程教学资源(PPT讲稿)第1章 程序设计和C语言.pptx
- 《计算机组装与维护》课程教学资源(PPT课件讲稿)第十一章 计算机数据恢复技术.ppt
- 贵州大学:计算机应用基础(PPT课件讲稿)计算机基础知识.pdf
- 《计算导论与程序设计》课程教学资源(PPT课件讲稿)Chap 5 函数.ppt
- 《计算机网络 Computer Networking》课程教学资源(PPT课件讲稿)Chapter 08 Network Security.ppt
- 《计算机网络与通信》课程教学资源(PPT课件)Chapter 8 传输层.ppt
- 《数据结构与算法分析》课程教学资源(PPT讲稿)Lists, Stacks and Queues.ppt
- 沈阳理工大学:《Visual Basic 6.0程序设计》课程教学资源(PPT课件讲稿)第三章 VB基本语言.ppt
- 南京大学:《计算机网络 Computer Networks》课程教学资源(PPT课件讲稿)简介、第一章 引论(谭晓阳).ppt
- 中国科学技术大学:《Linux操作系统分析》课程教学资源(PPT课件讲稿)第一章 绪论(主讲:陈香兰).ppt
- 西华大学:《电子商务概论》课程教学资源(PPT课件讲稿)第4章 电子商务的安全问题.ppt
- 北京大学:未来互联网体系结构(PPT讲稿)Future Internet Architecture(Introduction).pptx