清华大学:Mandarin Pronunciation Variation Modeling

NCMMSC 01 20-22 NOV 01, Shenzhen china Mandarin pronunciation Variation Modeling Thomas Fang Zheng Center of Speech Technology State Key Lab of Intelligent Technology and systems Department of Computer Science Technology Tsinghua University fzheng@sp.cs.tsinghua.edu.cn,http:/sp.cs.tsinghuaeducn/fzheng
Mandarin Pronunciation Variation Modeling Thomas Fang Zheng Center of Speech Technology State Key Lab of Intelligent Technology and Systems Department of Computer Science & Technology Tsinghua University fzheng@sp.cs.tsinghua.edu.cn, http://sp.cs.tsinghua.edu.cn/~fzheng/ NCMMSC’01 20-22 NOV 01, Shenzhen, China

Motivation o In spontaneous speech, pronunciations of individual words are different there are often 今 Sound changes,and 今 Phone changes Change includes insertion deletion and substitution ☆上 or chinese an additional accent problem even people are speaking mandarin due to different dialect backgrounds(in Chinese, 7 major dialects) colloquialism, grammar, style a Goal: modelling the pronunciation variations s Establishing a corpus with spontaneous phenomena, because we should know what the canonical phones change to Finding solutions to the pronunciation modelling theoretically and practically Center of speech Technology, Tsinghua University Slide 2
Center of Speech Technology, Tsinghua University Slide 2 Motivation ❑ In spontaneous speech, pronunciations of individual words are different, there are often ❖ Sound changes, and ❖ Phone changes. ❖ For Chinese ➢ an additional accent problem even people are speaking Mandarin, due to different dialect backgrounds (in Chinese, 7 major dialects) ➢ colloquialism, grammar, style ❑ Goal: modelling the pronunciation variations ❖ Establishing a corpus with spontaneous phenomena, because we should know what the canonical phones change to. ❖ Finding solutions to the pronunciation modelling theoretically and practically Change includes insertion, deletion and substitution

Overview Authors Paper Source Database Method WER T. Fukada. Y. Sagisaka Automatic generation of a pronunciation dictionary based Japanese AnN 75.54% (ATR, Japan) on a pronunciation network( EuroSpeech97) Prediction 6744% M-K LIu Bo Xu Mandarin accent adaptation based on CI/cD Shangha Confusion 45.13% (NLPR, China) pronunciation modeling(ICASSP2000) Accent(Intel MatrIx 40.24% M Saraclar(CLSP, JHU) Pronunciation modeling by sharing Gaussian densities Switchboard Gaussian 50.10% H Nock(CUED, Cam, UK)I across phonetic models(EuroSpeech99) 48.70% K Ma, G. Zavaliagkos Pronunciation modeling for large vocabulary Switchboard 5460% (GTE /BBN, USA) conversational speech recognit ion(ICSLP'98) Callhome 5349% M. Riley(AT&T Labs) Stochastic pronunciation modelling from hand-labelled TIMIT+ICSIDecision 44.66% W. Byrne(CLSP, JHU) phonetic corpora(Speech Communicaion, 1999(29) Tree 44.05% D. Povey, P.C. Wooland Improved discriminative training techniques for large Discriminant.60% ( CUED, Cambridge, UK) vocabulary continuous speech recognit ion(ICASSP'2001) Switchboard Training 44.30% T Hain P C Woodland New features in the cu-htk system for transcription of NIST Hubs VTLN 5160% CUED, Cambridge, UK) conversational telephone speech(ICASSP 2001) (Telephone) MMIE 4700% Center of speech Technology, Tsinghua University Slide 3
Center of Speech Technology, Tsinghua University Slide 3 Overview Authors Paper Source Database Method WER T. Fukada, Y. Sagisaka (ATR, Japan) Automatic generation of a pronunciation dictionary based on a pronunciation network (EuroSpeech’97) Japanese Spontaneous ANN Prediction 75.54 % 67.44 % M-K Liu, Bo Xu (NLPR, China) Mandarin accent adaptation based on CI/CD pronunciation modeling (ICASSP’2000) Shanghai Accent (Intel) Confusion Matrix 45.13 % 40.24 % M. Saraclar (CLSP, JHU) H. Nock (CUED, Cam., UK) Pronunciation modeling by sharing Gaussian densities across phonetic models (EuroSpeech’99) Switchboard Gaussian Sharing 50.10 % 48.70 % K. Ma, G. Zavaliagkos (GTE / BBN, USA) Pronunciation modeling for large vocabulary conversational speech recognition (ICSLP’98) Switchboard Callhome Lexical Adaptation 54.60 % 53.49 % M. Riley (AT&T Labs) W. Byrne (CLSP, JHU) Stochastic pronunciation modelling from hand-labelled phonetic corpora (Speech Communicaion, 1999 (29)) TIMIT + ICSI Decision Tree 44.66 % 44.05 % D. Povey, P.C. Wooland (CUED, Cambridge, UK) Improved discriminative training techniques for large vocabulary continuous speech recognition (ICASSP’2001) NAB, Switchboard Discriminant Training 46.60 % 44.30 % T. Hain, P.C. Woodland (CUED, Cambridge, UK) New features in the cu-htk system for transcription of conversational telephone speech (ICASSP’2001) NIST Hub5E (Telephone) VTLN MMIE 51.60 % 47.00 %

Necessity to establish a new annotated spontaneous speech corpus a The existing databases(incl. Broadcast News, CallHome, CallFriend, ..)do not cover all the Chinese spoken language phenomena pl , Sound changes: voiced, unvoiced, nasalization ,s Phone changes: retroflexed, OoV-phoneme a The existing databases do not contain pronunciation variation Intormation for use of bootstrap training o A Chinese annotated Spontaneous Speech(CAss) Corpus was established before wsoo on lsp in jhu Completely spontaneous(discourses, lectures, . Remarkable background noise, accent background Recorded onto tapes and then digitalized Center of speech Technology, Tsinghua University Slide 4
Center of Speech Technology, Tsinghua University Slide 4 ❑ The existing databases (incl. Broadcast News, CallHome, CallFriend, …) do not cover all the Chinese spoken language phenomena ❖ Sound changes: voiced, unvoiced, nasalization, … ❖ Phone changes: retroflexed, OOV-phoneme, … ❑ The existing databases do not contain pronunciation variation information for use of bootstrap training ❑ A Chinese Annotated Spontaneous Speech (CASS) Corpus was established before WS00 on LSP in JHU ❖ Completely spontaneous (discourses, lectures, ...) ❖ Remarkable background noise, accent background, ... ❖ Recorded onto tapes and then digitalized Necessity to establish a new annotated spontaneous speech corpus

Chinese Annotated Spontaneous speech (CASS) Corpus o CAss w/Five-Tier Transcription 令 Character level base form Syllable(or Pinyin) Level (w/tone base form Initial/Final (F level w/time boundary for baseform 令 SAMPA- C Level surface form 今 Miscellaneous level used for garbage modeling Lengthening, breathing, laughing, coughing, disfluency, noise, silence, murmur(unclear), modal, smack, non-Chinese xample Character 我们 认 点 SⅤable wo3 menO rent shio alan rer CASS Syllable wo3 menO duol ren 4 shio diana ren2 IF uom@_nt uo z'@_n i't iE n z'@ GIF uo @n tvu z@_ zan Misc noise Center of speech Technology, Tsinghua University Slide 5
Center of Speech Technology, Tsinghua University Slide 5 ❑ CASS w/ Five-Tier Transcription ❖ Character level : base form ❖ Syllable (or Pinyin) Level (w/ tone) : base form ❖ Initial/Final (IF) Level : w/ time boundary for baseform ❖ SAMPA-C Level : surface form ❖ Miscellaneous Level : used for garbage modeling ➢ Lengthening, breathing, laughing, coughing, disfluency, noise, silence, murmur (unclear), modal, smack, non-Chinese ❖ Example Character 我 们 多 认 识 点 人 Syllable wo3 men0 duo1 ren4 shi0 dian3 ren2 CASS Syllable wo3 men0 duo1 ren4 shi0 dianr3 ren2 IF uo m @_n t uo z` @_n s` i` t iE_n z` @_n GIF uo @_n t_v uo z` @_n s`_v t_v ia` z` @_n Misc noise mum Chinese Annotated Spontaneous Speech (CASS) Corpus

SAMPA-C: Machine readable Ipa a Phonologic consonants 23 a Phonologic vowels o Initials 21 口 finals 38 口 Retroflexed finals 38 o Tones and silences a Sound changes a Spontaneous phenomenon labels Center of speech Technology, Tsinghua University Slide 6
Center of Speech Technology, Tsinghua University Slide 6 ❑ Phonologic Consonants - 23 ❑ Phonologic Vowels - 9 ❑ Initials - 21 ❑ Finals - 38 ❑ Retroflexed finals - 38 ❑ Tones and Silences ❑ Sound Changes ❑ Spontaneous Phenomenon Labels SAMPA-C: Machine Readable IPA

Key points in PM (1) a Choosing and generating speech recognition unit (SrU set , So as to well describe the phone changes and sound changes ,s Could be syllable, semi-syllable, or INITIAL/FINAL a Constructing a multi-pronunciation lexicon(MPL) s a syllable-to-sru lexicon to reflect the relation between the ammatical units and acoustic models a Acoustically modeling spontaneous speech Theoretical framework . s CD modeling confusion matrix; data-driven Center of speech Technology, Tsinghua University Slide 7
Center of Speech Technology, Tsinghua University Slide 7 Key Points in PM (1) ❑ Choosing and generating speech recognition unit (SRU) set ❖ So as to well describe the phone changes and sound changes ❖ Could be syllable, semi-syllable, or INITIAL/FINAL. ❑ Constructing a multi-pronunciation lexicon (MPL) ❖ A syllable-to-SRU lexicon to reflect the relation between the grammatical units and acoustic models ❑ Acoustically modeling spontaneous speech ❖ Theoretical framework ❖ CD modeling; confusion matrix; data-driven

Key points in PM (2) a Customizing decoding algorithm according to new lexicon Improved time-synchronous search algorithm to reduce the path expansion(caused by CD modeling) a based algorithm based tree-trellis search algorithm to score multiple pronunciation variations simul taneously in the path a Modifying statistical language model W=arg max P(X W)P(W) W= arg max P(XIn)P() W W=Baseform() w=argmax P(X)(W)P(W) W=Baseform(l Center of speech Technology, Tsinghua University Slide 8
Center of Speech Technology, Tsinghua University Slide 8 Key Points in PM (2) ❑ Customizing decoding algorithm according to new lexicon ❖ Improved time-synchronous search algorithm to reduce the path expansion (caused by CD modeling) ❖ A* based algorithm based tree-trellis search algorithm to score multiple pronunciation variations simultaneously in the path ❑ Modifying statistical language model ˆ arg max ( | ) ( ) W W P X W P W = ( ) ˆ argmax ( | ) ( ) W Baseform V W P X V P V = = ( ) ˆ argmax ( | ) ( | ) ( ) W Baseform V W P X V P V W P W = =

Establishment of multi-Pron Lexicon a Two major approaches ☆ Define ed by linguists and phonetist Data-driven confusion matrix. rewritten rules decision tree 口 Our metho Find all possible pronunciations in SAMPA-C from database Reduce the size according to occurring frequencies Center of speech Technology, Tsinghua University Slide g
Center of Speech Technology, Tsinghua University Slide 9 ❑ Two major approaches ❖ Defined by linguists and phonetists ❖ Data-driven: confusion matrix, rewritten rules, decision tree ... ❑ Our method: ❖ Find all possible pronunciations in SAMPA-C from database ❖ Reduce the size according to occurring frequencies Establishment of Multi-Pron. Lexicon

Surface form for IF and syllable o Learning pronunciations Definition of Generalized Initial-Finals(GIFs) Collect all of them and choose the ts canonical most frequent ones ts v voiced as GIFs ts changed ts v changed to voiced ch canonica 7 troflexed or changed to ' e changed . Definition of Generalized Syllables(Gss)the lexicon Define them chang 0. tsh AN accordin ing to GIF chang 0. 1215 ts hv AN set chaI ng [0.0280] ts v AN chang [0.0187 AN chang [0.0187]z AN chang [0.0093 IAN P(GIFI GIF I Syllable) chang 0.0093]tsh AN chang [0.0093]tsh Center of Speech Technology, Tsinghua University Slide 10
Center of Speech Technology, Tsinghua University Slide 10 ❑ Learning pronunciations ❖ Definition of Generalized Initial-Finals (GIFs) ➢ z ts : canonical ➢ z ts_v : voiced ➢ z ts` : changed to ‘zh’ ➢ z ts`_v : changed to voiced ‘zh’ ➢ e 7 : canonical ➢ e 7` : retroflexed or changed to ‘er’ ➢ e @ : changed ❖ Definition of Generalized Syllables (GSs) – the lexicon ➢ chang [0.7850] ts`_h AN ➢ chang [0.1215] ts`_h_v AN ➢ chang [0.0280] ts`_v AN ➢ chang [0.0187] AN ➢ chang [0.0187] z` AN ➢ chang [0.0093] iAN ➢ chang [0.0093] ts_h AN ➢ chang [0.0093] ts`_h UN P ( [GIFi ] GIFf | Syllable ) Define them according to GIF set. Collect all of them and choose the most frequent ones as GIFs. Probabilistic lexicon. Surface form for IF and Syllable
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 清华大学出版社:《C语言程序设计》课程教学资源(PPT课件讲稿)第7章 用户自定义函数.ppt
- 中国科学技术大学:《算法基础》课程教学资源(PPT课件讲稿)第七讲 顺序统计学(主讲人:吕敏).pptx
- 《Java语言程序设计》课程教学资源(PPT课件讲稿)第三章 面向对象特征.ppt
- Virtual Topologies - Faculty of Science, HKBU.ppt
- 《Adobe Photoshop CS》软件教程(PPT讲稿)第13章 使用路径.ppt
- 《软件开发》课程PPT教学课件:Chapter 16 异常处理 Exception Handling.ppt
- 西安电子科技大学:《计算机网络 Computer Networks》课程教学资源(PPT课件讲稿)基于CORBA的分布式平台(CORBA编程-Hello World例程).ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第七章 网络安全.ppt
- 北京大学:浅谈计算机研究的层次与境界(李振华).pptx
- 南京大学:《计算机图形学》课程教学资源(PPT课件讲稿)计算机图形学引言(主讲:路通).ppt
- 国家十一五规划教材:《电子商务案例分析》课程教学资源(PPT课件)第11章 网络社区模式案例分析.ppt
- 西安电子科技大学:《操作系统 Operating Systems》课程教学资源(PPT课件讲稿)Chapter 08 多处理器系统 Multiple Processor Systems.ppt
- 计算机问题求解(PPT讲稿)图论中的其它专题.pptx
- SIGCOMM 2002:New Directions in Traffic Measurement and Accounting.ppt
- 厦门大学计算机科学系:《大数据技术原理与应用》课程教学资源(PPT课件)第十章 数据可视化.ppt
- 成都信息工程大学(成都信息工程学院):分层分流培养个性发展的计算机卓越工程师——专业课分层教学探索与实践.ppt
- 沈阳理工大学:《Java程序设计基础》课程教学资源(PPT课件讲稿)第1章 创建Java开发环境.ppt
- 北京师范大学网络教育:《计算机应用基础》课程教学资源(PPT讲稿)第8章 计算机安全、第9章 多媒体技术.pptx
- 西安电子科技大学:《8086CPU 指令系统》课程教学资源(PPT课件讲稿,共五部分,王晓甜).pptx
- 北京大学:《搜索引擎 Search Engines》课程教学资源(PPT讲稿)Evaluating Search Engines(Search Engines Information Retrieval in Practice).ppt
- 西安电子科技大学:《MATLAB程序设计语言》课程教学资源(PPT讲稿)Chapter1 Matlab系统概述.ppt
- 中国科学技术大学:《网络算法学》课程教学资源(PPT课件)第六章 传输控制.ppt
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Socket Programming Part II:Design of Server Software.ppt
- 上海交通大学:《软件开发》课程教学资源(PPT课件)第一讲 概述.ppt
- 《计算机网络原理》课程教学资源(PPT课件讲稿)第二章 网络实现模型.ppt
- 香港理工大学:INSTRUCTION SETS 指令.pptx
- 计算机问题求解(PPT讲稿)B树.pptx
- 北京大学远程教育:《计算机应用基础》课程PPT教学课件(专科)串讲(综合复习).pptx
- 《Microsoft Access 2003》教程PPT:第9章 报表设计.ppt
- 《编译原理和技术》课程PPT教学课件:第十三章 函数式语言的编译.ppt
- 四川大学:Object-Oriented Design and Programming(Java,PPT课件).ppt
- 安徽理工大学:《汇编语言》课程教学资源(PPT课件讲稿)第五章 循环与分支程序设计.ppt
- 《C程序设计》课程PPT教学课件(电子教案)第六章 函数.ppt
- 基于语义关联和信息增益的TFIDF改进算法研究.ppt
- Integrated analysis of regulatoryand metabolic networks revealsnovel regulatory mechanisms inSaccharomyces cerevisiae.ppt
- 山东大学:《计算机图形学》课程PPT教学课件(Programming with OpenGL)Part 3:Three Dimensions.ppt
- 《算法设计技巧与分析》课程教学资源(PPT讲稿)Lecture 8 贪婪法则 Greedy Approach.ppt
- 山西国际商务职业学院:《网页设计与制作》课程教学资源(PPT课件)第一章 网页设计基础知识.ppt
- 《多媒体教学软件设计》课程PPT教学课件:第13章 多媒体教学软件中脚本编程技巧.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)动态调度(Cont)、推断执行和ILP.ppt