《深度自然语言处理》课程教学课件(Natural language processing with deep learning)08 Language Model & Distributed Representation(5/6)

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (5)交通大学ChenLicli@xjtu.edu.cn2023
Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (5) Natural language processing with deep learning

Outlines1.Self-attention2. Transformer3. Pre-training LM
Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Outlines1.Self-attention2. Transformer3. Pre-training LM
Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Self-attentionSelf-Attentionyt=f(at,A,B)Where AandB areanother sequence (matrix)交通大学
Self-attention l Where A and B are another sequence (matrix) l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)WhereA andB areanotherseguence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention交通大学
Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Where A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!
Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Completely out ofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!
Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! Completely out of the traditional RNN or CNN framework l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Completelyoutofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!Fasterand can directly get globalinformation!
Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! Completely out of the traditional RNN or CNN framework Faster and can directly get global information ! l Self-Attention

Self-attentionSelf-AttentionKeylKey2Key3Key4AttentionQueryValueValuelValue2Value3Value4Source交道大学
Self-attention l Self-Attention

Self-attentionSelf-AttentionKeylKey2Key3Key4KeylKey2Key3Key4AttentionQueryValueStep1QueryF(Q,K)F(QK)FIQKF(Q,K)ValuelValue2Value3Value4s2s3s4SSourceSoftMax(Step2Calculationprocess:Step 1:calculatingthesimilarityAttentionbetweenqueryandkeytogettheValueweightsStep3ValuelValue2Value3Value4
Self-attention l Self-Attention Step 1 Step 2 Step 3 Calculation process: lStep 1: calculating the similarity between query and key to get the weights
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)07 Language Model & Distributed Representation(4/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)05 Language Model & Distributed Representation(2/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)06 Language Model & Distributed Representation(3/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)04 Language Model & Distributed Representation(1/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)03 Fundamental Tasks of NLP.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)01 About the course.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)02 What is NLP, why NLP and How NLP.pdf
- 佛山大学(佛山科学技术学院):2022年版计算机科学与技术专业理论课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版物联网实践课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版智能科学与技术专业理论课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版物联网实验课程教学大纲汇编.pdf
- 《物联网导论》课程教学资源(PPT课件)第16章 SLAM空间智能计算.pptx
- 《物联网导论》课程教学资源(PPT课件)第15章 低功耗广域网.pptx
- 《物联网导论》课程教学资源(PPT课件)第14章 毫米波感知.pptx
- 《物联网导论》课程教学资源(PPT课件)第13章 无源感知系统.pptx
- 《物联网导论》课程教学资源(PPT课件)第12章 智慧工业.pptx
- 《物联网导论》课程教学资源(PPT课件)第11章 智慧供应链.pptx
- 《物联网导论》课程教学资源(PPT课件)第10章 智能建筑.pptx
- 《物联网导论》课程教学资源(PPT课件)第6章 新兴通信技术.pptx
- 《物联网导论》课程教学资源(PPT课件)第5章 移动互联网.pptx
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)09 Language Model & Distributed Representation(6/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)12 sentiment analysis.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)11 coreference resolution.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)10 information extraction.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)15 Machine translation.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)14 Question Answering.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)16 Natural Language Generation.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)17 Deep leanring Programing framework.pdf
- 全国信息安全标准化技术委员会:大数据安全标准化白皮书(2018 版).pdf
- 沈阳师范大学:《大学计算机基础》课程教学大纲 Fundamentals of University Computer A.pdf
- 沈阳师范大学:《大学计算机基础》课程授课教案(讲义,共五章,任课教师:刘冰).pdf
- 《大学计算机基础》课程教学资源(教案讲义,共五章,沈阳师范大学:刘冰).pdf
- 《大学计算机基础》课程教学大纲 Fundamentals of University Computer A.pdf
- 《大学计算机基础》课程教学资源(PPT课件,完整讲稿,共五章).pptx
- 《数据库技术与应用》课程教学资源(授课教案)第1章 数据库基础、第2章 数据库和表(沈阳师范大学:安晓飞).pdf
- 沈阳师范大学:《大学计算机基础》课程教学资源(PPT课件,完整讲稿,共五章).pptx
- 沈阳师范大学:《数据库原理》课程教学大纲 DataBase Principle.pdf
- 沈阳师范大学:《数据库原理》课程授课教案(讲义,共十章,主讲:马佳琳).pdf
- 沈阳师范大学:《数据库原理》课程教学课件(讲稿)第10章 数据库恢复技术.pdf
- 沈阳师范大学:《数据库原理》课程教学课件(讲稿)第11章 并发控制.pdf
