中国高校课件下载中心 》 教学资源 》 大学文库

《深度自然语言处理》课程教学课件(Natural language processing with deep learning)08 Language Model & Distributed Representation(5/6)

文档信息
资源类别:文库
文档格式:PDF
文档页数:94
文件大小:2.72MB
团购合买:点击进入团购
内容简介
《深度自然语言处理》课程教学课件(Natural language processing with deep learning)08 Language Model & Distributed Representation(5/6)
刷新页面文档预览

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (5)交通大学ChenLicli@xjtu.edu.cn2023

Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (5) Natural language processing with deep learning

Outlines1.Self-attention2. Transformer3. Pre-training LM

Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Outlines1.Self-attention2. Transformer3. Pre-training LM

Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Self-attentionSelf-Attentionyt=f(at,A,B)Where AandB areanother sequence (matrix)交通大学

Self-attention l Where A and B are another sequence (matrix) l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)WhereA andB areanotherseguence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention交通大学

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Where A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Completely out ofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! Completely out of the traditional RNN or CNN framework l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Completelyoutofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!Fasterand can directly get globalinformation!

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! Completely out of the traditional RNN or CNN framework Faster and can directly get global information ! l Self-Attention

Self-attentionSelf-AttentionKeylKey2Key3Key4AttentionQueryValueValuelValue2Value3Value4Source交道大学

Self-attention l Self-Attention

Self-attentionSelf-AttentionKeylKey2Key3Key4KeylKey2Key3Key4AttentionQueryValueStep1QueryF(Q,K)F(QK)FIQKF(Q,K)ValuelValue2Value3Value4s2s3s4SSourceSoftMax(Step2Calculationprocess:Step 1:calculatingthesimilarityAttentionbetweenqueryandkeytogettheValueweightsStep3ValuelValue2Value3Value4

Self-attention l Self-Attention Step 1 Step 2 Step 3 Calculation process: lStep 1: calculating the similarity between query and key to get the weights

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档