中国高校课件下载中心 》 教学资源 》 大学文库

《深度自然语言处理》课程教学课件(Natural language processing with deep learning)09 Language Model & Distributed Representation(6/6)

文档信息
资源类别:文库
文档格式:PDF
文档页数:51
文件大小:1.37MB
团购合买:点击进入团购
内容简介
《深度自然语言处理》课程教学课件(Natural language processing with deep learning)09 Language Model & Distributed Representation(6/6)
刷新页面文档预览

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (6)交通大学ChenLicli@xjtu.edu.cn2023

Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (6) Natural language processing with deep learning

Outlines1. Pre-training LMGPT2. 3. Bert大4. T5

Outlines 1. Pre-training LM 2. GPT 3. Bert 4. T5

Outlines1.Pre-trainingLMGPT2. 3.Bert4. T5

Outlines 1. Pre-training LM 2. GPT 3. Bert 4. T5

AdvancedLMTaxonomy of pre-training LMCBOW,Skip-Gram[123]Non-ContextualGloVe[127]Contextual?ContextualELMo[129].GPT[136],BERT[35]LSTMELMo [129], CoVe [120]TransformerEncBERT[35].SpanBERT[1]].XLNet[202].RoBERTa[11]]ArchitecturesGPT (L36], GPT-2 [137]Transformer Dec.MASS [154], BART [98]TransformerXNLG[19],mBART[L12]MTCoVe[120]SupervisedLMELMo[129],GPT[136].GPT-2[137],UniLM[38]BERT[35],SpanBERT[],RoBERTa[]XLM-R[28]Task TypesMLMTLMXLM[27]Seq2SeqMLMMASS [154],T5 [138]Unsupervised/PTMsSelf-SupervisedPLMXLNet [202]DAEBART(98]RTDCBOW-NS[123],ELECTRA[24]NSPCTLBERT [35],UniLM [38]SOPALBERT[9]],Suruc(BERT[187]

Advanced LM l Taxonomy of pre-training LM

AdvancedLMTaxonomyofpre-trainingLMERNIETHU)20Z.KnOWBERTL30].K-BERT107)Knowledge-EnrichedSentiLR[82].KEPLER[189].WKEM[195]XLUmBERT35],Unicoder[67].XLM27].XLM-R[28].MultiFit[4]MultilingualXLGMASS[154.mBART12.XNLG19ERNIE(Baidu)[64].BERT-wwm-Chinese[29].NEZHA[9],ZEN[36]Language-SpecificBERTje[32].CamemBERT[119].FlauBERT[93].RobBERT[34]ViLBERT[L14].LXMERT[169]ImageVisualBERT00B2T22].VL-BERT57]ExtensionsMulti-ModalVideoVideoBERT[L59].CBT[158]SpeechSpeechBERT[22]Domain-SpecificSentiLR[82],BioBERT[96],SciBERT[]],PatentBERT[95]Model PruningCompressingBERT[50]QuantizationQ-BERT50].Q8BERT[204ALBERT[9]ModelCompressionParameter SharingDistillationDistilBERT[46],TinyBERT[4].MiniLM[188]Module ReplacingBERT-of-Theseus[196]

Advanced LM l Taxonomy of pre-training LM

Pre-training LMPretraining Language Models with three architecturesLanguage models!Whatwe've seen so far.DecodersNice to generate from; can't condition on future wordsGetsbidirectional context-can condition onfuture!EncodersWait,how do we pretrain them?交道大学Encoder-Goodpartsofdecodersandencoders?Whats thebest wayto pretrain them?Decoders

Pre-training LM l Pretraining Language Models with three architectures Decoders • Language models! What we've seen so far. • Nice to generate from; can't condition on future words Encoders • Gets bidirectional context – can condition on future! • Wait, how do we pretrain them? Encoder- Decoders • Good parts of decoders and encoders? • What’s the best way to pretrain them?

Pre-trainingLMPretrainingLanguage ModelswiththreearchitecturesPretraining for three types of architecturesThe neural architecture influences the type of pretraining, and natural use cases.Languagemodels!Whatwe'veseensofarDecodersNice to generate from; can't condition on future wordsancondiiononfutEncoderswepretrairihem?逸大Encoderparts-ofdecoDecoders

Pre-training LM Pretraining for three types of architectures The neural architecture influences the type of pretraining, and natural use cases. • Language models! What we’ve seen so far. • Nice to generate from; can’t condition on future words Decoders • Gets bidirectional context – can condition on future! • Wait, how do we pretrain them? Encoders Encoder- Decoders • Good parts of decoders and encoders? • What’s the best way to pretrain them? l Pretraining Language Models with three architectures

Pre-trainingLMPretrainingdecodersWhen using language modelpretrained decoders,wecan ignorethattheyweretrainedtomodel(/1:-1)交通大学

Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) l Pretraining decoders

Pre-training LMPretraining decodersWhen using language modelpretrained decoders,wecan ignorethattheyweretrainedtomodel(/1:-1)?/?A,bLinearWe canfinetunethembytraininga classifieronthelastword'shiddenstate.hi,..,hrh,....h, =Decoder(wi..., Wr)y~ Awr +bWi..,WT交通大[Notehowthelinearlayerhasn'tbeenpretrainedandmustbelearnedfromscratch.]

Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) We can finetune them by training a classifier on the last word’s hidden state. 1 1 ,., Decoder( ,., ) T T h h  w w ~ T y Aw  b [Note how the linear layer hasn’t been pretrained and must be learned from scratch.] l Pretraining decoders

Pre-training LMPretraining decodersWhen using language modelpretrained decoders,wecan ignorethat they were trained to model ( 11: -1)/?A,bLinearWecan finetunethembytraininga classifier onthelastword'shiddenstate.hi,..,hrh,.... h, = Decoder(w...., W.)y~ Awr +bWi,...,WT通大Where andarerandomly initialized[Notehowthe linear layerhasn'tbeenandspecifiedbythedownstreamtaskpretrainedandmustbelearnedfromscratch.]Gradientsbackpropagatethroughthewholenetwork

Pre-training LM When using language model pretrained decoders, we can ignore that they were trained to model �(� |�1: −1 ) We can finetune them by training a classifier on the last word’s hidden state. 1 1 ,., Decoder( ,., ) T T h h  w w ~ T y Aw  b Where � and � are randomly initialized and specified by the downstream task. Gradients backpropagate through the whole network. [Note how the linear layer hasn’t been pretrained and must be learned from scratch.] l Pretraining decoders

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档