哈尔滨工业大学:词义消歧(PPT讲稿)Word sense disambiguation

Word Sense disambiguation Zhang Yu zhangyu( irhit. edu.cn
Word Sense Disambiguation Zhang Yu zhangyu@ir.hit.edu.cn

Overview of the problem Problem: many words have different meanings Or senses, i.e. there is ambiguity about how they are to be specifically interpreted (e. g, differentian Task to determine which of the senses of an ambiguous word is invoked in a particular use of the word by looking at the context of its use Note: more often than not the different senses of a word are closely related 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 2 Overview of the Problem ◼ Problem: many words have different meanings or senses, i.e., there is ambiguity about how they are to be specifically interpreted (e.g., differentiate). ◼ Task: to determine which of the senses of an ambiguous word is invoked in a particular use of the word by looking at the context of its use. ◼ Note: more often than not the different senses of a word are closely related

Ambiguity resolution Bank Title The rising ground bordering a Name/heading of a book lake river. or sea statue. work of art or music An establishment for the etc custody, loan exchange,or Material at the start of a film issue of money, for the The right of legal ownership extension of credit and for (of land) facilitating the transmission of The document that is evidence funds of the right A n appe lation of respect attached to a person s name A written work(synecdoche part stands for the whole) 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 3 Ambiguity Resolution ◼ Bank ◼ The rising ground bordering a lake, river, or sea ◼ An establishment for the custody, loan exchange, or issue of money, for the extension of credit, and for facilitating the transmission of funds ◼ Title ◼ Name/heading of a book, statue, work of art or music, etc. ◼ Material at the start of a film ◼ The right of legal ownership (of land) ◼ The document that is evidence of the right ◼ An appellation of respect attached to a person’ s name ◼ A written work (synecdoche: part stands for the whole)

Overview of our discussion Methodology Supervised Disambiguation: based on a labeled training set Dictionary-Based Disambiguation: based on lexical resources such as dictionaries and thesauri Unsupervised Disambiguation: based on unlabeled corpora 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 4 Overview of our Discussion ◼ Methodology ◼ Supervised Disambiguation: based on a labeled training set. ◼ Dictionary-Based Disambiguation: based on lexical resources such as dictionaries and thesauri. ◼ Unsupervised Disambiguation: based on unlabeled corpora

Methodological Preliminaries Supervised versus Unsupervised Learning: In supervised learning(classification), the sense label of each word occurrence is provided in the training set; whereas, in unsupervised learning (clustering), it is not provided Pseudowords: used to generate artificial evaluation data for comparison and improvements of text-processing algorithms e.g, replace each of two words(e.g,, bell and book) with a psuedoword(e.g, bell-book a Upper and Lower Bounds on Performance: used to find out how well an algorithm performs relative to the difficulty of the task Upper: human performance Lower: baseline using highest frequency alternative(best of 2 versus 10) 20212/5 Natural Language Processing--Word Sense Disambiguation
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 5 Methodological Preliminaries ◼ Supervised versus Unsupervised Learning: In supervised learning (classification), the sense label of each word occurrence is provided in the training set; whereas, in unsupervised learning (clustering), it is not provided. ◼ Pseudowords: used to generate artificial evaluation data for comparison and improvements of text-processing algorithms, e.g., replace each of two words (e.g., bell and book) with a psuedoword (e.g., bell-book). ◼ Upper and Lower Bounds on Performance: used to find out how well an algorithm performs relative to the difficulty of the task. ◼ – Upper: human performance ◼ – Lower: baseline using highest frequency alternative (best of 2 versus 10)

Supervised disambiguation Training set: exemplars where each occurrence of the ambiguous word w is annotated with a semantic label This becomes a statistical classification problem; assign w some sense sk in context cl Approaches Bavesian Classification: the context of occurrence is treated as a bag of words without structure, but it integrates Information from many words in a context window. nformation Theory: only looks at the most informative feature in the context, which may be sensitive to text structure. T here are many more approaches (see Chapter 16 or a text on Machine Learning ml) that could be applied 20212/5 Natural Language Processing--Word Sense Disambiguation 6
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 6 Supervised Disambiguation ◼ Training set: exemplars where each occurrence of the ambiguous word w is annotated with a semantic label. This becomes a statistical classification problem; assign w some sense sk in context cl. ◼ Approaches: ◼ Bayesian Classification: the context of occurrence is treated as a bag of words without structure, but it integrates information from many words in a context window. ◼ Information Theory: only looks at the most informative feature in the context, which may be sensitive to text structure. ◼ There are many more approaches (see Chapter 16 or a text on Machine Learning (ML)) that could be applied

Supervised Disambiguation Bayesian classification (Gale et al, 1992): look at the words around an ambiguous word in a large context window. Each content word contributes potentially useful information about which sense of the ambiguous word is likely to be used with it. The classifier does no feature selection; it simply combines the evidence from all features, assuming they are independent Bayes decision rule: Decide s if P(S1>PGld for Sk t Optimal because it minimizes the probability of error; for each individual case it selects the class with the highest conditional probability(and hence owest error rate) a Error rate for a sequence will also be minimized 20212/5 Natural Language Processing--Word Sense Disambiguation 7
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 7 Supervised Disambiguation: Bayesian Classification ◼ (Gale et al, 1992): look at the words around an ambiguous word in a large context window. Each content word contributes potentially useful information about which sense of the ambiguous word is likely to be used with it. The classifier does no feature selection; it simply combines the evidence from all features, assuming they are independent. ◼ Bayes decision rule: Decide s ’ if P(s ’|c) > P(sk|c) for sk ≠s ’ ◼ Optimal because it minimizes the probability of error; for each individual case it selects the class with the highest conditional probability (and hence lowest error rate). ◼ Error rate for a sequence will also be minimized

Supervised Disambiguation Bayesian classification We do not usually know p(sko), but we can use Baye Rule to compute it: P(k|)=P(|s/P()×P() P(k is the prior probability of Se,1.e, the probability of instance s, without any contextual information When updating the prior with evidence from context (i.e P(CSe/P(), we obtain the posterior probability P(e S If all we want to do is select the correct class, we can ignore P(. Also use logs to simplify computation Assign word w sense s'=argmax kp(sd argmaxeP(c5kX P(k= argmax klog P(c sk+ log P 20212/5 Natural Language Processing--Word Sense Disambiguation 8
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 8 Supervised Disambiguation: Bayesian Classification ◼ We do not usually know P(sk|c), but we can use Bayes’ Rule to compute it: ◼ P(sk|c) = (P(c|sk )/P(c)) × P(sk ) ◼ P(sk ) is the prior probability of sk , i.e., the probability of instance sk without any contextual information. ◼ When updating the prior with evidence from context (i.e., P(c|sk )/P(c)), we obtain the posterior probability P(sk|c). ◼ If all we want to do is select the correct class, we can ignore P(c). Also use logs to simplify computation. ◼ Assign word w sense s ’ = argmaxskP(sk|c) =argmaxskP(c|sk ) × P(sk ) = argmaxsk[log P(c| sk ) + log P(sk )]

Bayesian Classification: Nalve bayes Naive bayes is widely used in ml due to its ability to efficiently combine evidence from a wide variety of features. can be applied if the state of the world we base our classification on can be described as a series of attributes in this case, we describe the context of w in terms of the words y that occur in the context Naive bayes assumption: The attributes used for classification are conditionally independent: P(c 5y P(Zl y; in c) Ise=Ilyin P( I sg ■ Two consequences: The structure and linear ordering of words d bag of words model The presence of one word is independent of another, which is clearly untrue in text 20212/5 Natural Language Processing--Word Sense Disambiguation 9
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 9 Bayesian Classification: Naïve Bayes ◼ Naïve Bayes: ◼ is widely used in ML due to its ability to efficiently combine evidence from a wide variety of features. ◼ can be applied if the state of the world we base our classification on can be described as a series of attributes. ◼ in this case, we describe the context of w in terms of the words vj that occur in the context. ◼ Naïve Bayes assumption: ◼ The attributes used for classification are conditionally independent: P(c|sk ) = P({vj| vj in c}|sk ) = П vj in c P(vj | sk ) ◼ Two consequences: ◼ The structure and linear ordering of words is ignored: bag of words model. ◼ The presence of one word is independent of another, which is clearly untrue in text

Bayesian Classification: Naive bayes Although the naive bayes assumption is incorrect in the context of text processing, it often does quite well, partly because the decisions made can be optimal even in the face of the inaccurate assumption a Decision rule for Naive bayes: Decide sif s=argmax k[log P(5k+smiinc log p(oi lsel P(Oils and p(e are computed via Maximum-Likelihoo Estimation, perhaps with appropriate smoothing, from a labeled training corpus P(|)=C(y)C( P()=C()/C(n) 20212/5 Natural Language Processing--Word Sense Disambiguation 10
2021/2/5 Natural Language Processing -- Word Sense Disambiguation 10 Bayesian Classification: Naïve Bayes ◼ Although the Naïve Bayes assumption is incorrect in the context of text processing, it often does quite well, partly because the decisions made can be optimal even in the face of the inaccurate assumption. ◼ Decision rule for Naïve Bayes: Decide s ’ if s ’ =argmaxsk[log P(sk )+Σvj in c log P(vj|sk )] ◼ P(vj|sk ) and P(sk ) are computed via Maximum-Likelihood Estimation, perhaps with appropriate smoothing, from a labeled training corpus. ◼ P(vj|sk ) = C(vj ,sk )/C(sk ) ◼ P(sk ) = C(sk )/C(w)
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 香港城市大学:Adaptive Random Test Case Prioritization(PPT讲稿).pptx
- 《单片机原理及接口技术》课程教学资源(PPT课件)第7章 AT89C51单片机系统扩展 7.4 数据存储器的扩展 7.5 I/O口的扩展.ppt
- 《计算机组装与维护》课程教学资源(PPT课件讲稿)第16章 常见计算机故障解决案例.ppt
- 《计算机组装与维护》课程教学资源(PPT讲稿)第九章 计算机软件维护.ppt
- 对外经济贸易大学:《电子商务概论 Electronic Commerce》课程教学资源(PPT课件讲稿)第八章 电子支付与网络银行.pptx
- 西安电子科技大学:《Mobile Programming》课程PPT教学课件(Android Programming)Lecture 04 Activity, Intent and UI.pptx
- 中国科学技术大学:《网络信息安全 NETWORK SECURITY》课程教学资源(PPT课件讲稿)第九章 网络攻击.ppt
- 《面向对象建模技术》课程教学资源(PPT课件讲稿)第11章 UML与RUP.ppt
- 上海交通大学:IT项目管理(PPT讲稿)讲座5 目标、范围管理与需求工程.ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)设计模式 Design Patterns(1).ppt
- 《算法分析与设计》课程教学资源(PPT课件讲稿)第六章 基本检索与周游方法(一般方法).ppt
- 《面向对象技术》课程教学大纲 Technology of Object-Oriented Programming.doc
- 厦门大学:Web技术(PPT课件讲稿)网站快速开发 & Web前端技术.ppt
- 机械工业出版社:国家“十一五”规划教材《数据库原理与应用教程》教学资源(PPT课件,第3版)第4章 数据操作.ppt
- 《高级语言程序设计》课程教学资源(试卷习题)试题二(无答案).doc
- 《Photoshop教程》教学资源(PPT课件)第6章 Photoshop的绘图工具.ppt
- 《计算机网络》课程教学大纲 Computer Networks.pdf
- 《VB程序设计》课程教学资源(PPT课件讲稿)第二章 VB语言基础.ppt
- 西安电子科技大学:《现代密码学》课程教学资源(PPT课件讲稿)第一章 绪论(主讲:董庆宽).pptx
- 可信计算 Trusted Computing(PPT讲稿)TSS - TCG Software Stack.ppt
- 大连工业大学:《数据结构》课程教学资源(PPT课件讲稿,共十章,路莹).pps
- 清华大学出版社:《计算机网络安全与应用技术》课程教学资源(PPT课件讲稿)第6章 黑客原理与防范措施.ppt
- 中国科学技术大学:《信息论与编码技术》课程教学资源(PPT课件讲稿)第2章 离散信源及其信息测度.pptx
- 《数字图像处理》课程教学资源(PPT课件)第七章 图像分割.ppt
- Detecting Evasion Attack at High Speed without Reassembly.ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)类和对象 Class and Object.ppt
- 《数字图像处理》课程教学资源(PPT课件)第五章 代数运算.ppt
- 《高级语言程序设计》课程教学资源(试卷习题)试题三(无答案).doc
- 东南大学:《操作系统概念 Operating System Concepts》课程教学资源(PPT课件讲稿)08 Main Memory(主讲:张柏礼).ppt
- 中国科学技术大学:《高级操作系统 Advanced Operating System》课程教学资源(PPT课件讲稿)第四章 分布式进程和处理机管理.ppt
- Network Alignment(PPT讲稿)Treating Networks as Wireless Interference Channel.pptx
- 虚拟存储(PPT课件讲稿)Virtual Memory.ppt
- 《计算机组成原理》课程教学资源(PPT课件讲稿)第二章 电子计算机中信息的表示及其运算.ppt
- 中国科学技术大学:《算法设计与分析》课程教学资源(PPT课件讲稿)第一部分 概率算法(黄刘生).ppt
- 《程序设计语言原理》课程教学资源(PPT课件讲稿)形式语义学 Formal Semantics.ppt
- MSC Software Corporation:Dynamic System Modeling, Simulation, and Analysis Using MSC.EASY5(Advanced Class).ppt
- SVM原理与应用(PPT讲稿).pptx
- 安徽理工大学:《汇编语言》课程教学资源(PPT课件讲稿)第二章 80x86计算机组织.ppt
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)设计模式 Design Pattern(3).ppt
- 《C语言程序设计》课程教学资源(PPT课件讲稿)第2章 数据类型与常用库函数.ppt