《深度自然语言处理》课程教学课件(Natural language processing with deep learning)07 Language Model & Distributed Representation(4/6)

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (4)交通大学ChenLicli@xjtu.edu.cn2023
Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (4) Natural language processing with deep learning

Outlines1.RNN based LM2. Seq2seq Model3.AttentionMechanism
Outlines 1. RNN based LM 2. Seq2seq Model 3. Attention Mechanism

Outlines1.RNNbasedLM2.Seq2seqModel3.AttentionMechanism
Outlines 1. RNN based LM 2. Seq2seq Model 3. Attention Mechanism

RNNLMRecurrent Neural Networks(RNN)Main idea:use the same W(1)(2)(3)(4)Output-(optional)h(1)h(2)h(3)h(4)自具wWHidden layer-福OInput sequencer(1)x(2)(3)x(4)(variable-length)
RNN LM � (1) � (2) � (3) � (4) � Recurrent Neural Networks (RNN) • Main idea: use the same � � � � � (1) � (2) � (3) � (4) ℎ (1) ℎ (2) ℎ (3) ℎ (4) Output (optional) Hidden layer Input sequence (variable-length) . .

RNNLM交通大学Word vector (one-hot, distributedthegirlheropenedrepresentation......)x(1)x(2)x(3)x(4)x(t) E IRIVI
RNN LM � (1) � (2) � (3) � (4) � (�) ∈ ℝ|�| Word vector(one-hot, distributed representation.) the girl opened her

RNNLMWord Embedding(1e(t) = Ex(t)EEEEWord vector (one-hot,distributedthegirlheropenedrepresentation.......)x(1)x(2)x(3)x(4)x(t) E IRIVI
RNN LM � (1) � (2) � (3) � (4) � (�) ∈ ℝ|�| Word vector(one-hot, distributed representation.) � (�) = �� (�) Word Embedding the girl opened her � � � � � (1) � (2) � (3) � (4)

RNNLMh(0)h(1)h(3)h(4)h(2).HiddenLayer3BwWnWhWn一+h(t) = o(Wnh(t-1) + Wee(t) + b1)h(o) is the initial hidden stateOWeWWW.福Word Embeddinge(1)2(4)e(t) = Ex(t)??EEEEWord vector (one-hot,distributedthegirlheropenedrepresentation.......x(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ ℎ (1) ℎ (2) ℎ (3) ℎ (4) Word vector(one-hot, distributed representation.) � (�) = �� (�) Word Embedding ℎ (�) = �(�ℎℎ (�−1) + ��� (�) + �1) ℎ (0) is the initial hidden state Hidden Layer �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) � (�) = �� (�)

RNNLMbookslaptopsOutputLayer(4) =P(x(5)|thegirl opened her)y(t) = softmax(Uh(t) + b2) E IRIVlZOO-h(0)h(1)h(2)h(3)h(4)8HiddenLayerwwnwhWh.+h(t) = o(Wnh(t-1) + Wee(t) + b1).h(o) is the initial hidden stateO?WeWeWeW?Word EmbeddingOe(1)(2)(3e(4)e(t) = Ex(t)..?EEEEWord vector (one-hot, distributedthegirlheropenedrepresentation......x(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) Word vector(one-hot, distributed representation.) Word Embedding ℎ (�) = �(�ℎℎ (�−1) + ��� (�) + �1) ℎ (0) is the initial hidden state Hidden Layer � (�) = ����𝑓�(�ℎ (�) + �2) ∈ ℝ|�| Output Layer �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) books laptops a zoo � (�) = �� (�) � (�) = �� (�)

RNNLMbooksAdvantagelaptopsy(4) = P(x(5)|the girl opened her)Could process variable-lengthsentences;ZOO-Theoretically, t step usesh(0)h(1)h(2)h(3)h(4)informationof several8iw:wprevious steps;O..Thesizeof model doesn'tWeWeWeWgrow as input becomes longer;?Each step uses asame w(3e(1)(2)e(4)whichsaves computation..?power.EEEEthegirlheropenedx(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) books laptops a zoo Advantage • Could process variable-length sentences; • Theoretically, � step uses information of several previous steps; • The size of model doesn’t grow as input becomes longer; • Each step uses a same � which saves computation power

RNNLMbooksAdvantagelaptopsy(4) = P(x(5)|the girl opened her)Couldprocessvariable-lengthsentences;ZOOTheoretically,tstepuses informationh(0)h(1)h(2)h(3)h(4)of several previous steps;The size of model doesn'tgrowaswwwhwhRPinputbecomes longer;O?EachstepusesasamewwhichWeWeWeWsavescomputationpower.福DisadvantageOe(1)(2)(3e(4)福Recursivecomputationisslow,..?Itis difficultto transmit informationofEEEEprevious stepscompletelythegirlheropenedx(1)x(2)x(3)x(4)
RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) �ℎ the girl opened her ℎ (0) �� �� �� �� � � � � � (1) � (2) � (3) � (4) books laptops a zoo Advantage Disadvantage • Recursive computation is slow; • It is difficult to transmit information of previous steps completely. • Could process variable-length sentences; • Theoretically, � step uses information of several previous steps; • The size of model doesn’t grow as input becomes longer; • Each step uses a same � which saves computation power
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)05 Language Model & Distributed Representation(2/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)06 Language Model & Distributed Representation(3/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)04 Language Model & Distributed Representation(1/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)03 Fundamental Tasks of NLP.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)01 About the course.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)02 What is NLP, why NLP and How NLP.pdf
- 佛山大学(佛山科学技术学院):2022年版计算机科学与技术专业理论课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版物联网实践课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版智能科学与技术专业理论课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版物联网实验课程教学大纲汇编.pdf
- 《物联网导论》课程教学资源(PPT课件)第16章 SLAM空间智能计算.pptx
- 《物联网导论》课程教学资源(PPT课件)第15章 低功耗广域网.pptx
- 《物联网导论》课程教学资源(PPT课件)第14章 毫米波感知.pptx
- 《物联网导论》课程教学资源(PPT课件)第13章 无源感知系统.pptx
- 《物联网导论》课程教学资源(PPT课件)第12章 智慧工业.pptx
- 《物联网导论》课程教学资源(PPT课件)第11章 智慧供应链.pptx
- 《物联网导论》课程教学资源(PPT课件)第10章 智能建筑.pptx
- 《物联网导论》课程教学资源(PPT课件)第6章 新兴通信技术.pptx
- 《物联网导论》课程教学资源(PPT课件)第5章 移动互联网.pptx
- 《物联网导论》课程教学资源(PPT课件)第4章 感知技术.pptx
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)08 Language Model & Distributed Representation(5/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)09 Language Model & Distributed Representation(6/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)12 sentiment analysis.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)11 coreference resolution.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)10 information extraction.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)15 Machine translation.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)14 Question Answering.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)16 Natural Language Generation.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)17 Deep leanring Programing framework.pdf
- 全国信息安全标准化技术委员会:大数据安全标准化白皮书(2018 版).pdf
- 沈阳师范大学:《大学计算机基础》课程教学大纲 Fundamentals of University Computer A.pdf
- 沈阳师范大学:《大学计算机基础》课程授课教案(讲义,共五章,任课教师:刘冰).pdf
- 《大学计算机基础》课程教学资源(教案讲义,共五章,沈阳师范大学:刘冰).pdf
- 《大学计算机基础》课程教学大纲 Fundamentals of University Computer A.pdf
- 《大学计算机基础》课程教学资源(PPT课件,完整讲稿,共五章).pptx
- 《数据库技术与应用》课程教学资源(授课教案)第1章 数据库基础、第2章 数据库和表(沈阳师范大学:安晓飞).pdf
- 沈阳师范大学:《大学计算机基础》课程教学资源(PPT课件,完整讲稿,共五章).pptx
- 沈阳师范大学:《数据库原理》课程教学大纲 DataBase Principle.pdf
- 沈阳师范大学:《数据库原理》课程授课教案(讲义,共十章,主讲:马佳琳).pdf
- 沈阳师范大学:《数据库原理》课程教学课件(讲稿)第10章 数据库恢复技术.pdf
