《深度自然语言处理》课程教学课件（Natural language processing with deep learning）08 Language Model & Distributed Representation（5/6）

文档信息

资源类别：文库
文档格式：PDF
文档页数：94
文件大小：2.72MB
团购合买：点击进入团购

内容简介

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (5)交通大学ChenLicli@xjtu.edu.cn2023

Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (5) Natural language processing with deep learning

Outlines1.Self-attention2. Transformer3. Pre-training LM

Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Outlines1.Self-attention2. Transformer3. Pre-training LM

Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Self-attentionSelf-Attentionyt=f(at,A,B)Where AandB areanother sequence (matrix)交通大学

Self-attention l Where A and B are another sequence (matrix) l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)WhereA andB areanotherseguence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention交通大学

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Where A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!

Self-attentionSelf-Attentionyt = f(at, A, B)Completely out ofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! Completely out of the traditional RNN or CNN framework l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Completelyoutofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!Fasterand can directly get globalinformation!

Self-attentionSelf-AttentionKeylKey2Key3Key4AttentionQueryValueValuelValue2Value3Value4Source交道大学

Self-attention l Self-Attention

Self-attentionSelf-AttentionKeylKey2Key3Key4KeylKey2Key3Key4AttentionQueryValueStep1QueryF(Q,K)F(QK)FIQKF(Q,K)ValuelValue2Value3Value4s2s3s4SSourceSoftMax(Step2Calculationprocess:Step 1:calculatingthesimilarityAttentionbetweenqueryandkeytogettheValueweightsStep3ValuelValue2Value3Value4

Self-attention l Self-Attention Step 1 Step 2 Step 3 Calculation process: lStep 1: calculating the similarity between query and key to get the weights

共94页，可试读20页，点击继续阅读 ↓

刷新页面下载完整文档

VIP每日下载上限内不扣除下载券和下载次数；
按次数下载不扣除下载券；
注册用户24小时内重复下载只扣除一次；
顺序：VIP每日次数-->可用次数-->下载券；

点击下载完整版文档（PDF）