《深度自然语言处理》课程教学课件（Natural language processing with deep learning）07 Language Model & Distributed Representation（4/6）

文档信息

资源类别：文库
文档格式：PDF
文档页数：61
文件大小：2.34MB
团购合买：点击进入团购

内容简介

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (4)交通大学ChenLicli@xjtu.edu.cn2023

Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (4) Natural language processing with deep learning

Outlines1.RNN based LM2. Seq2seq Model3.AttentionMechanism

Outlines 1. RNN based LM 2. Seq2seq Model 3. Attention Mechanism

Outlines1.RNNbasedLM2.Seq2seqModel3.AttentionMechanism

Outlines 1. RNN based LM 2. Seq2seq Model 3. Attention Mechanism

RNNLMRecurrent Neural Networks(RNN)Main idea:use the same W(1)(2)(3)(4)Output-(optional)h(1)h(2)h(3)h(4)自具wWHidden layer-福OInput sequencer(1)x(2)(3)x(4)(variable-length)

RNN LM � (1) � (2) � (3) � (4) � Recurrent Neural Networks (RNN) • Main idea: use the same � � � � � (1) � (2) � (3) � (4) ℎ (1) ℎ (2) ℎ (3) ℎ (4) Output (optional) Hidden layer Input sequence (variable-length) . .

RNNLM交通大学Word vector (one-hot, distributedthegirlheropenedrepresentation......)x(1)x(2)x(3)x(4)x(t) E IRIVI

RNN LM � (1) � (2) � (3) � (4) � (�) ∈ ℝ|�| Word vector（one-hot, distributed representation.） the girl opened her

RNNLMWord Embedding(1e(t) = Ex(t)EEEEWord vector (one-hot,distributedthegirlheropenedrepresentation.......)x(1)x(2)x(3)x(4)x(t) E IRIVI

RNN LM � (1) � (2) � (3) � (4) � (�) ∈ ℝ|�| Word vector（one-hot, distributed representation.） � (�) = �� (�) Word Embedding the girl opened her � � � � � (1) � (2) � (3) � (4)

RNNLMh(0)h(1)h(3)h(4)h(2).HiddenLayer3BwWnWhWn一+h(t) = o(Wnh(t-1) + Wee(t) + b1)h(o) is the initial hidden stateOWeWWW.福Word Embeddinge(1)2(4)e(t) = Ex(t)??EEEEWord vector (one-hot,distributedthegirlheropenedrepresentation.......x(1)x(2)x(3)x(4)

RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ ℎ (1) ℎ (2) ℎ (3) ℎ (4) Word vector（one-hot, distributed representation.） � (�) = �� (�) Word Embedding ℎ (�) = �(�ℎℎ (�−1) + �� (�) + �1) ℎ (0) is the initial hidden state Hidden Layer �ℎ the girl opened her ℎ (0) �� (1) � (2) � (3) � (4) � (�) = �� (�)

RNNLMbookslaptopsOutputLayer(4) =P(x(5)|thegirl opened her)y(t) = softmax(Uh(t) + b2) E IRIVlZOO-h(0)h(1)h(2)h(3)h(4)8HiddenLayerwwnwhWh.+h(t) = o(Wnh(t-1) + Wee(t) + b1).h(o) is the initial hidden stateO?WeWeWeW?Word EmbeddingOe(1)(2)(3e(4)e(t) = Ex(t)..?EEEEWord vector (one-hot, distributedthegirlheropenedrepresentation......x(1)x(2)x(3)x(4)

RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) Word vector（one-hot, distributed representation.） Word Embedding ℎ (�) = �(�ℎℎ (�−1) + �� (�) + �1) ℎ (0) is the initial hidden state Hidden Layer � (�) = ��𝑓�(�ℎ (�) + �2) ∈ ℝ|�| Output Layer �ℎ the girl opened her ℎ (0) �� (1) � (2) � (3) � (4) books laptops a zoo � (�) = �� (�) � (�) = �� (�)

RNNLMbooksAdvantagelaptopsy(4) = P(x(5)|the girl opened her)Could process variable-lengthsentences;ZOO-Theoretically, t step usesh(0)h(1)h(2)h(3)h(4)informationof several8iw:wprevious steps;O..Thesizeof model doesn'tWeWeWeWgrow as input becomes longer;?Each step uses asame w(3e(1)(2)e(4)whichsaves computation..?power.EEEEthegirlheropenedx(1)x(2)x(3)x(4)

RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) �ℎ the girl opened her ℎ (0) �� (1) � (2) � (3) � (4) books laptops a zoo Advantage • Could process variable-length sentences; • Theoretically, � step uses information of several previous steps; • The size of model doesn’t grow as input becomes longer; • Each step uses a same � which saves computation power

RNNLMbooksAdvantagelaptopsy(4) = P(x(5)|the girl opened her)Couldprocessvariable-lengthsentences;ZOOTheoretically,tstepuses informationh(0)h(1)h(2)h(3)h(4)of several previous steps;The size of model doesn'tgrowaswwwhwhRPinputbecomes longer;O?EachstepusesasamewwhichWeWeWeWsavescomputationpower.福DisadvantageOe(1)(2)(3e(4)福Recursivecomputationisslow,..?Itis difficultto transmit informationofEEEEprevious stepscompletelythegirlheropenedx(1)x(2)x(3)x(4)

RNN LM � (1) � (2) � (3) � (4) �ℎ �ℎ �ℎ � (4) = �(� (5)|�ℎ� 𝑔𝑖 𝑜𝑝𝑒 ℎ��) ℎ (1) ℎ (2) ℎ (3) ℎ (4) �ℎ the girl opened her ℎ (0) �� (1) � (2) � (3) � (4) books laptops a zoo Advantage Disadvantage • Recursive computation is slow; • It is difficult to transmit information of previous steps completely. • Could process variable-length sentences; • Theoretically, � step uses information of several previous steps; • The size of model doesn’t grow as input becomes longer; • Each step uses a same � which saves computation power

共61页，可试读20页，点击继续阅读 ↓

刷新页面下载完整文档

VIP每日下载上限内不扣除下载券和下载次数；
按次数下载不扣除下载券；
注册用户24小时内重复下载只扣除一次；
顺序：VIP每日次数-->可用次数-->下载券；

点击下载完整版文档（PDF）