《深度自然语言处理》课程教学课件(Natural language processing with deep learning)09 Language Model & Distributed Representation(6/6)

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (6)交通大学ChenLicli@xjtu.edu.cn2023
Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (6) Natural language processing with deep learning

Outlines1. Pre-training LMGPT2. 3. Bert大4. T5
Outlines 1. Pre-training LM 2. GPT 3. Bert 4. T5

Outlines1.Pre-trainingLMGPT2. 3.Bert4. T5
Outlines 1. Pre-training LM 2. GPT 3. Bert 4. T5

AdvancedLMTaxonomy of pre-training LMCBOW,Skip-Gram[123]Non-ContextualGloVe[127]Contextual?ContextualELMo[129].GPT[136],BERT[35]LSTMELMo [129], CoVe [120]TransformerEncBERT[35].SpanBERT[1]].XLNet[202].RoBERTa[11]]ArchitecturesGPT (L36], GPT-2 [137]Transformer Dec.MASS [154], BART [98]TransformerXNLG[19],mBART[L12]MTCoVe[120]SupervisedLMELMo[129],GPT[136].GPT-2[137],UniLM[38]BERT[35],SpanBERT[],RoBERTa[]XLM-R[28]Task TypesMLMTLMXLM[27]Seq2SeqMLMMASS [154],T5 [138]Unsupervised/PTMsSelf-SupervisedPLMXLNet [202]DAEBART(98]RTDCBOW-NS[123],ELECTRA[24]NSPCTLBERT [35],UniLM [38]SOPALBERT[9]],Suruc(BERT[187]
Advanced LM l Taxonomy of pre-training LM

AdvancedLMTaxonomyofpre-trainingLMERNIETHU)20Z.KnOWBERTL30].K-BERT107)Knowledge-EnrichedSentiLR[82].KEPLER[189].WKEM[195]XLUmBERT35],Unicoder[67].XLM27].XLM-R[28].MultiFit[4]MultilingualXLGMASS[154.mBART12.XNLG19ERNIE(Baidu)[64].BERT-wwm-Chinese[29].NEZHA[9],ZEN[36]Language-SpecificBERTje[32].CamemBERT[119].FlauBERT[93].RobBERT[34]ViLBERT[L14].LXMERT[169]ImageVisualBERT00B2T22].VL-BERT57]ExtensionsMulti-ModalVideoVideoBERT[L59].CBT[158]SpeechSpeechBERT[22]Domain-SpecificSentiLR[82],BioBERT[96],SciBERT[]],PatentBERT[95]Model PruningCompressingBERT[50]QuantizationQ-BERT50].Q8BERT[204ALBERT[9]ModelCompressionParameter SharingDistillationDistilBERT[46],TinyBERT[4].MiniLM[188]Module ReplacingBERT-of-Theseus[196]
Advanced LM l Taxonomy of pre-training LM

Pre-training LMPretraining Language Models with three architecturesLanguage models!Whatwe've seen so far.DecodersNice to generate from; can't condition on future wordsGetsbidirectional context-can condition onfuture!EncodersWait,how do we pretrain them?交道大学Encoder-Goodpartsofdecodersandencoders?Whats thebest wayto pretrain them?Decoders
Pre-training LM l Pretraining Language Models with three architectures Decoders • Language models! What we've seen so far. • Nice to generate from; can't condition on future words Encoders • Gets bidirectional context – can condition on future! • Wait, how do we pretrain them? Encoder- Decoders • Good parts of decoders and encoders? • What’s the best way to pretrain them?

Pre-trainingLMPretrainingLanguage ModelswiththreearchitecturesPretraining for three types of architecturesThe neural architecture influences the type of pretraining, and natural use cases.Languagemodels!Whatwe'veseensofarDecodersNice to generate from; can't condition on future wordsancondiiononfutEncoderswepretrairihem?逸大Encoderparts-ofdecoDecoders
Pre-training LM Pretraining for three types of architectures The neural architecture influences the type of pretraining, and natural use cases. • Language models! What we’ve seen so far. • Nice to generate from; can’t condition on future words Decoders • Gets bidirectional context – can condition on future! • Wait, how do we pretrain them? Encoders Encoder- Decoders • Good parts of decoders and encoders? • What’s the best way to pretrain them? l Pretraining Language Models with three architectures

Pre-trainingLMPretrainingdecodersWhen using language modelpretrained decoders,wecan ignorethattheyweretrainedtomodel(/1:-1)交通大学
Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) l Pretraining decoders

Pre-training LMPretraining decodersWhen using language modelpretrained decoders,wecan ignorethattheyweretrainedtomodel(/1:-1)?/?A,bLinearWe canfinetunethembytraininga classifieronthelastword'shiddenstate.hi,..,hrh,....h, =Decoder(wi..., Wr)y~ Awr +bWi..,WT交通大[Notehowthelinearlayerhasn'tbeenpretrainedandmustbelearnedfromscratch.]
Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) We can finetune them by training a classifier on the last word’s hidden state. 1 1 ,., Decoder( ,., ) T T h h w w ~ T y Aw b [Note how the linear layer hasn’t been pretrained and must be learned from scratch.] l Pretraining decoders

Pre-training LMPretraining decodersWhen using language modelpretrained decoders,wecan ignorethat they were trained to model ( 11: -1)/?A,bLinearWecan finetunethembytraininga classifier onthelastword'shiddenstate.hi,..,hrh,.... h, = Decoder(w...., W.)y~ Awr +bWi,...,WT通大Where andarerandomly initialized[Notehowthe linear layerhasn'tbeenandspecifiedbythedownstreamtaskpretrainedandmustbelearnedfromscratch.]Gradientsbackpropagatethroughthewholenetwork
Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) We can finetune them by training a classifier on the last word’s hidden state. 1 1 ,., Decoder( ,., ) T T h h w w ~ T y Aw b Where � and � are randomly initialized and specified by the downstream task. Gradients backpropagate through the whole network. [Note how the linear layer hasn’t been pretrained and must be learned from scratch.] l Pretraining decoders
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)08 Language Model & Distributed Representation(5/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)07 Language Model & Distributed Representation(4/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)05 Language Model & Distributed Representation(2/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)06 Language Model & Distributed Representation(3/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)04 Language Model & Distributed Representation(1/6).pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)03 Fundamental Tasks of NLP.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)01 About the course.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)02 What is NLP, why NLP and How NLP.pdf
- 佛山大学(佛山科学技术学院):2022年版计算机科学与技术专业理论课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版物联网实践课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版智能科学与技术专业理论课程教学大纲汇编.pdf
- 佛山大学(佛山科学技术学院):2022年版物联网实验课程教学大纲汇编.pdf
- 《物联网导论》课程教学资源(PPT课件)第16章 SLAM空间智能计算.pptx
- 《物联网导论》课程教学资源(PPT课件)第15章 低功耗广域网.pptx
- 《物联网导论》课程教学资源(PPT课件)第14章 毫米波感知.pptx
- 《物联网导论》课程教学资源(PPT课件)第13章 无源感知系统.pptx
- 《物联网导论》课程教学资源(PPT课件)第12章 智慧工业.pptx
- 《物联网导论》课程教学资源(PPT课件)第11章 智慧供应链.pptx
- 《物联网导论》课程教学资源(PPT课件)第10章 智能建筑.pptx
- 《物联网导论》课程教学资源(PPT课件)第6章 新兴通信技术.pptx
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)12 sentiment analysis.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)11 coreference resolution.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)10 information extraction.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)15 Machine translation.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)14 Question Answering.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)16 Natural Language Generation.pdf
- 《深度自然语言处理》课程教学课件(Natural language processing with deep learning)17 Deep leanring Programing framework.pdf
- 全国信息安全标准化技术委员会:大数据安全标准化白皮书(2018 版).pdf
- 沈阳师范大学:《大学计算机基础》课程教学大纲 Fundamentals of University Computer A.pdf
- 沈阳师范大学:《大学计算机基础》课程授课教案(讲义,共五章,任课教师:刘冰).pdf
- 《大学计算机基础》课程教学资源(教案讲义,共五章,沈阳师范大学:刘冰).pdf
- 《大学计算机基础》课程教学大纲 Fundamentals of University Computer A.pdf
- 《大学计算机基础》课程教学资源(PPT课件,完整讲稿,共五章).pptx
- 《数据库技术与应用》课程教学资源(授课教案)第1章 数据库基础、第2章 数据库和表(沈阳师范大学:安晓飞).pdf
- 沈阳师范大学:《大学计算机基础》课程教学资源(PPT课件,完整讲稿,共五章).pptx
- 沈阳师范大学:《数据库原理》课程教学大纲 DataBase Principle.pdf
- 沈阳师范大学:《数据库原理》课程授课教案(讲义,共十章,主讲:马佳琳).pdf
- 沈阳师范大学:《数据库原理》课程教学课件(讲稿)第10章 数据库恢复技术.pdf
- 沈阳师范大学:《数据库原理》课程教学课件(讲稿)第11章 并发控制.pdf
- 沈阳师范大学:《数据库原理》课程教学课件(讲稿)第1章 概论(主讲:马佳琳).pdf
