Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM

碩士 === 國立交通大學 === 資訊工程研究所 === 83 === A speaker-dependent speech recognition system performs high recognition rate, but it needs a lot of speaker-specific training data. A speaker-independent (or multi-speaker) system needs no tr...

Full description

Bibliographic Details
Main Authors: Chien-Hung Chen, 陳健宏
Other Authors: Chi-Min Liu
Format: Others
Language:zh-TW
Published: 1995
Online Access:http://ndltd.ncl.edu.tw/handle/66731320553819607466
id ndltd-TW-083NCTU0392064
record_format oai_dc
spelling ndltd-TW-083NCTU03920642015-10-13T12:53:37Z http://ndltd.ncl.edu.tw/handle/66731320553819607466 Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM 利用半連續型隱藏式馬可夫模型建立的語者調適之中文語音辨識系統 Chien-Hung Chen 陳健宏 碩士 國立交通大學 資訊工程研究所 83 A speaker-dependent speech recognition system performs high recognition rate, but it needs a lot of speaker-specific training data. A speaker-independent (or multi-speaker) system needs no training data from speakers, and it cannot get satis- -factory performance usually. A speaker-adaptive system uses the existing knowledge from a reliably trained reference system, so that a small amount of new speaker's training data is suffi- cient to reach the performance of speaker-dependent system. In this thesis, we consider the applying of speaker adaptation techniques in Mandarin speech. The vocabulary we study has 76 syllables, which include 19 INITIALs and 4 FINALs from the confusing sets in Mandarin syllables. For the reference systems in speaker adaptation, we create speaker-dependent and speaker- independent systems based on the semi-continuous density hidden Markov model (SCHMM). The speaker-dependent system has an aver- -age recognition rate 90.46% and the speaker-independent system 58.97%. On the basis of the two reference systems, we study the Bayesian adaptation techniques with the forward-backward training procedure. We apply the adaptation techniques to adjust codebooks, mixture weights, and transition probabilities in SCHMM. Experiment results show that the adaptation procedure achieves better performance than that of the speaker-independent system with only one training token, it raises recognition rate from 58.97% to 76.65 %. When 3 training tokens are used, the recognition rate approximates that of the speaker-dependent system. When using 6 training tokens, the recognition rate achieves better than that of the speaker-dependent system. Chi-Min Liu 劉啟民 1995 學位論文 ; thesis 70 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊工程研究所 === 83 === A speaker-dependent speech recognition system performs high recognition rate, but it needs a lot of speaker-specific training data. A speaker-independent (or multi-speaker) system needs no training data from speakers, and it cannot get satis- -factory performance usually. A speaker-adaptive system uses the existing knowledge from a reliably trained reference system, so that a small amount of new speaker's training data is suffi- cient to reach the performance of speaker-dependent system. In this thesis, we consider the applying of speaker adaptation techniques in Mandarin speech. The vocabulary we study has 76 syllables, which include 19 INITIALs and 4 FINALs from the confusing sets in Mandarin syllables. For the reference systems in speaker adaptation, we create speaker-dependent and speaker- independent systems based on the semi-continuous density hidden Markov model (SCHMM). The speaker-dependent system has an aver- -age recognition rate 90.46% and the speaker-independent system 58.97%. On the basis of the two reference systems, we study the Bayesian adaptation techniques with the forward-backward training procedure. We apply the adaptation techniques to adjust codebooks, mixture weights, and transition probabilities in SCHMM. Experiment results show that the adaptation procedure achieves better performance than that of the speaker-independent system with only one training token, it raises recognition rate from 58.97% to 76.65 %. When 3 training tokens are used, the recognition rate approximates that of the speaker-dependent system. When using 6 training tokens, the recognition rate achieves better than that of the speaker-dependent system.
author2 Chi-Min Liu
author_facet Chi-Min Liu
Chien-Hung Chen
陳健宏
author Chien-Hung Chen
陳健宏
spellingShingle Chien-Hung Chen
陳健宏
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
author_sort Chien-Hung Chen
title Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
title_short Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
title_full Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
title_fullStr Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
title_full_unstemmed Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
title_sort speaker adaptation for mandarin syllable recognition based on semi-continuous density hmm
publishDate 1995
url http://ndltd.ncl.edu.tw/handle/66731320553819607466
work_keys_str_mv AT chienhungchen speakeradaptationformandarinsyllablerecognitionbasedonsemicontinuousdensityhmm
AT chénjiànhóng speakeradaptationformandarinsyllablerecognitionbasedonsemicontinuousdensityhmm
AT chienhungchen lìyòngbànliánxùxíngyǐncángshìmǎkěfūmóxíngjiànlìdeyǔzhědiàoshìzhīzhōngwényǔyīnbiànshíxìtǒng
AT chénjiànhóng lìyòngbànliánxùxíngyǐncángshìmǎkěfūmóxíngjiànlìdeyǔzhědiàoshìzhīzhōngwényǔyīnbiànshíxìtǒng
_version_ 1716868648158625792