清华大学:方言汉语语音识别(PPT讲稿)Dialectal Chinese Speech Recognition

Cambridge University, UK Dialectal Chinese Speech Recognition Thomas Fang Zheng Aug.24,2007 Center for Speech and language Technologies Center for Speech and Language Technologies, Tsinghua University
Center for Speech and Language Technologies, Tsinghua University Dialectal Chinese Speech Recognition Thomas Fang Zheng Aug. 24, 2007 @ Cambridge University, UK

2 Outline 口 Motivation u Dialectal chinese database collection ☆Wu Mi ☆ Chuan Approaches o Chinese syllable mapping 令 Lexicon adaptation State-dependent phoneme-based model merging sDPBMM Integration of SDPBMM with adaptation 口 Remarks Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 2 Outline ❑ Motivation ❑ Dialectal Chinese database collection ❖ Wu ❖ Min ❖ Chuan ❑ Approaches ❖ Chinese syllable mapping ❖ Lexicon adaptation ❖ State-dependent phoneme-based model merging (SDPBMM) ❖ Integration of SDPBMM with adaptation ❑ Remarks

Motivation u Chinese asr encounters an issue that is bigger than that of any other language-dialect a There are 8 major dialectal regions in addition to Mandarin(northern China), including o Wu Southern Jiangsu, Zhejiang, and shanghai Yue(guangdong, Hong Kong, Nanning Guangxi 8 Min(Fujian, Shantou Guangdong, Haikou Hainan, Taipei taiwan) Hakka meixian guangdong, Hsin-chu Taiwan 令Gan( Jiangxi); 令 Xiang( Hunan); 冷Hi( Anhui) o Jin shanxi, Hohehot Inner mongolia u Can be further divided into over 40 sub-categories Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 3 Motivation ❑ Chinese ASR encounters an issue that is bigger than that of any other language - dialect. ❑ There are 8 major dialectal regions in addition to Mandarin (Northern China), including:- ❖ Wu (Southern Jiangsu, Zhejiang, and Shanghai); ❖ Yue (Guangdong, Hong Kong, Nanning Guangxi); ❖ Min (Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan); ❖ Hakka (Meixian Guangdong, Hsin-chu Taiwan); ❖ Gan (Jiangxi); ❖ Xiang (Hunan); ❖ Hui (Anhui) ❖ Jin (Shanxi, Hohehot Inner Mongolia). ❑ Can be further divided into over 40 sub-categories

中国汉语方言图 新州土 ]州话 东 请查地区 其请有制方 可a准 布老冒函 回回 北万冒晒 射语 语 中话 □ 话 客家话 客家函据民 若南盲语土话并用地区
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 4

h 5 a Chinese dialects share a same written language B The same Chinese pinyin set (canonically B The same Chinese character set (canonically), and The same vocabulary canonically) a And standard Chinese(known as Putonghua, or PTH) is widely spoken in most regions over china a However, speech is strongly influenced by the native dialects, most Chinese people speak in both standard Chinese and their own dialect, resulting in dialectal Chinese- Putonghua influenced by native dialect o In dialectal Chinese B Word usage, pronunciation, and syntax and grammar vary depending on the speaker's dialect g asr relies to a great extent on the consistent pronunciation and usage of words within a language B ASR systems constructed to process PTh perform poorly for the great majority of the population Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 5 ❑ Chinese dialects share a same written language:- ❖ The same Chinese pinyin set (canonically), ❖ The same Chinese character set (canonically), and ❖ The same vocabulary (canonically). ❑ And standard Chinese (known as Putonghua, or PTH) is widely spoken in most regions over China. ❑ However, speech is strongly influenced by the native dialects, most Chinese people speak in both standard Chinese and their own dialect, resulting in dialectal Chinese - Putonghua influenced by native dialect ❑ In dialectal Chinese :- ❖ Word usage, pronunciation, and syntax and grammar vary depending on the speaker's dialect. ❖ ASR relies to a great extent on the consistent pronunciation and usage of words within a language. ❖ ASR systems constructed to process PTH perform poorly for the great majority of the population

6 Research Goal a To develop a general framework to model in dialectal Chinese Asr tasks: g Phonetic variability i Lexical variability and i Pronunciation variability u To find suitable methods to modify the baseline pth recognizer to obtain a dialectal Chinese recognizer for the specific dialect of interest, which employ g dialect-related knowledge(syllable mapping, cross-dialect synonyms,.), and training/adaptation data ( in relatively small quantities a Expectation: the resulted recognizer should also work for PTH, in other words it should be good for a mixture of Pth and dialectal chinese a This proposal was selected as one of three projects for 2003 Johns Hopkins University Summer Workshop from tens of proposals collected from universities/companies over the world, and was postponed to 2004 due to SARS Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 6 Research Goal ❑ To develop a general framework to model in dialectal Chinese ASR tasks :- ❖ Phonetic variability, ❖ Lexical variability, and ❖ Pronunciation variability ❑ To find suitable methods to modify the baseline PTH recognizer to obtain a dialectal Chinese recognizer for the specific dialect of interest, which employ :- ❖ dialect-related knowledge (syllable mapping, cross-dialect synonyms, …), and ❖ training/adaptation data (in relatively small quantities) ❑ Expectation: the resulted recognizer should also work for PTH, in other words, it should be good for a mixture of PTH and dialectal Chinese. ❑ This proposal was selected as one of three projects for '2003 Johns Hopkins University Summer Workshop from tens of proposals collected from universities/companies over the world, and was postponed to 2004 due to SARS

h Standard Chinese Dialectal Chinese Related Speech Recognizer Knowledge Resources Dialectal Chinese Speech Recognition Framework Dialectal Chinese Speech recognizer Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 7 Dialectal Chinese Speech Recognition Framework Standard Chinese Speech Recognizer + Dialectal Chinese Speech Recognizer Dialectal Chinese Related Knowledge & Resources

h u For practical reasons, during the summer we only focused on one specific dialect, the wu dialect(Shanghai Area), and the target language was Wu dialectal Chinese(WDC for short) 日 Why wu dialect? 8 Population: more than 70 million people use Wu dialect, the 2nd popular dialect in China: 8 Economy: one of the most advanced city in China- Shanghai s Wu dialect is a full-developed language The syntax of Wu dialect is very complex The vocabulary is even more larger than Mandarin Many literature masterpiece were influenced by wu dialect (in history WU Mandarin Cantonese Phoneme# 50 37 <33 Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 8 ❑ For practical reasons, during the summer we only focused on one specific dialect, the Wu dialect (Shanghai Area), and the target language was Wu dialectal Chinese (WDC for short); ❑ Why Wu dialect? ❖ Population: more than 70 million people use WU dialect, the 2nd popular dialect in China; ❖ Economy: one of the most advanced city in China – Shanghai ❖ Wu dialect is a full-developed language ▪ The syntax of Wu dialect is very complex; ▪ The vocabulary is even more larger than Mandarin; ▪ Many literature masterpiece were influenced by WU dialect (in history). WU Mandarin Cantonese Phoneme# 50 37 <33

9 Useful Dialect-Related Knowledge a Chinese Syllable Mapping(CSM) This Csm is dialect-related ☆ Two types: Word-independent CSM: e.g. in Southern Chinese, Initial mappings include zh>z, ch->c, sh>S, n>L, and so on, and Final mappings include eng>en, ing>in, and so on; Word-dependent CSM: e.g. in dialectal Chuan Chinese, the pinyin guo2' is changed into 'guio in word'FfEl(China) but only the tone is changed in word过去past Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 9 Useful Dialect-Related Knowledge ❑ Chinese Syllable Mapping (CSM) ❖ This CSM is dialect-related. ❖ Two types: ▪ Word-independent CSM: e.g. in Southern Chinese, Initial mappings include zh→z, ch→c, sh→s, n→l, and so on, and Final mappings include engen, ingin, and so on; ▪ Word-dependent CSM: e.g. in dialectal Chuan Chinese, the pinyin 'guo2' is changed into 'gui0' in word '中国(China)' but only the tone is changed in word '过去(past)

h 10 A ☆ The CSm could be n→1,1→N,令 The CSm is or crossed not exact For any mapping Chuan dialect A→>B,itis BI kuo kui mostly that B2(B(3 the resulted pronunciation is not B Bi is a variation of b. such 克服/上课 exactly, but as 扩大/魁梧 something nasalization, quite similar centralization iced to B. more ku voiceless, similar to B rounding, syllabic Standard Chinese syllabe set than to any pharyngrealization other syllable. aspiration Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 10 ❖ The CSM is not exact. For any mapping A→B, it is mostly that the resulted pronunciation is not B exactly, but something quite similar to B, more similar to B than to any other syllable. A B B1 B3 B4 B2 Bi is a variation of B, such as :- nasalization, centralization, voiced, voiceless, rounding, syllabic, pharyngrealization, aspiration kei kuo kui... Standard Chinese Syllabe Set Chuan Dialect ke [克]服 上[课] kuo kui [扩]大 [魁]梧 ❖ The CSM could be N→1, 1→N, or crossed
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 临沂大学文学院:《史记讲读》课程教学资源(PPT课件讲稿)《史记》作品讲解——项羽本纪、李将军列传(张学成).ppt
- 运城学院中文系:《中国古代文学》课程电子教案(PPT教学课件)第四章 诸子散文(论语、老子、墨子).ppt
- 《古代汉语》课程教学大纲(适用专业:汉语言文学).pdf
- 山东大学:现实主义文学与哈克贝利·费恩的冒险(PPT讲稿)Literature of Realism and The Adventures of Huckleberry Finn.ppt
- 《文学理论》课程教学资源(PPT课件讲稿)第四章 文学活动的审美意识形态属性.ppt
- 香港理工大学:姓名学(PPT讲稿)由汉字起源到姓名学.ppt
- 西安培华学院:《中国古代文学史》课程教学资源(PPT教案课件)宋代文学(共八章).ppt
- 求职信写作(PPT讲稿)Cover Letter.ppt
- 《中国当代文学》课程教学资源(PPT课件讲稿)伤痕文学、反思文学、改革文学.ppt
- 《发展汉语》课程初级综合(PPT课件讲稿)第5课 为什么我一个人站着吃.pptx
- 山东英才学院:普通话与演讲专题讲座(PPT讲稿,主讲人:白朝霞).ppt
- 乔治·戈登·拜伦(PPT课件讲稿)George Gordon Byron(1788-1824).ppt
- 《中国古代文学》课程教学资源(PPT课件讲稿)第三章 李白.ppt
- 《中国现当代文学》课程教学资源(PPT课件讲稿)第七章 80年代小说(一).ppt
- 《现代汉语》课程教学资源:教学大纲.pdf
- 南开大学现代远程教育学院:《大学语文》升本入学考试大纲.doc
- 《中国当代文学史》课程教程教学资源(PPT课件讲稿)第一章 十七年文学思潮(1949年 - 1966年).ppt
- 汉字应用水平测试辅导:传承中华文化感受汉字魅力(PPT讲稿).ppt
- 《中国古代文学》课程教学大纲2(汉语国际教育专业).pdf
- 《红楼梦》教学研究(PPT课件讲稿)又名《石头记》.ppt
- 外国文学名著赏析(PPT讲稿)西方文学的源头(古希腊神话、古希腊悲剧).ppt
- 临沂大学文学院:《史记讲读》课程教学资源(PPT课件讲稿)司马迁的生活时代和生平.ppt
- 西华大学:《语言学概论》课程教学资源(PPT课件)第六章 文字与书面语.ppt
- 《现代汉语》课程教学资源(PPT课件讲稿)第三章 文字.ppt
- 山东英才学院:《幼儿教师口语》课程PPT教学课件(讲稿)第九章 普通话水平测试.ppt
- 山东英才学院:《幼儿教师口语》课程PPT教学课件(讲稿)声母辨正.ppt
- 《中国古代文学》课程教学资源(PPT课件讲稿)隋唐五代(共十章).ppt
- 《古代汉语教程》课程教学资源(PPT讲稿)古代汉语中的同义词和反义词.pptx
- 《古代汉语》课程教学资源(PPT课件讲稿)汉字形体的演变.pptx
- 《中国现当代文学史》课程教学资源(PPT课件)第一章 五四文学思潮与运动.ppt
- 《现代汉语》课程教学资源:教学大纲.pdf
- 《中国古代文学》课程教学大纲2(汉语言文学专业).pdf
- 《中国古代文学》课程教学资源资源(PPT讲稿)文学常识.ppt
- 《文学理论》课程教学资源(PPT课件讲稿)第二编 文学活动 第三章 文学作为活动.ppt
- 山东财经大学:普通话水平测试辅导(PPT课件讲稿).ppt
- 《现代汉语》课程教学资源(PPT课件讲稿)句法成分.ppt
- 《现代汉语》课程教学资源(PPT课件讲稿)汉字的起源.pptx
- 《大学语文》课程教学资源(PPT讲稿)停连、重音.ppt
- 《现代汉语》精品课程教学资源(PPT课件讲稿)声韵调.ppt
- 《资治通鉴》教学资源(PPT课件讲稿)孙权劝学.ppt