Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM
碩士 === 國立交通大學 === 資訊工程研究所 === 83 === A speaker-dependent speech recognition system performs high recognition rate, but it needs a lot of speaker-specific training data. A speaker-independent (or multi-speaker) system needs no tr...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
1995
|
Online Access: | http://ndltd.ncl.edu.tw/handle/66731320553819607466 |
id |
ndltd-TW-083NCTU0392064 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-083NCTU03920642015-10-13T12:53:37Z http://ndltd.ncl.edu.tw/handle/66731320553819607466 Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM 利用半連續型隱藏式馬可夫模型建立的語者調適之中文語音辨識系統 Chien-Hung Chen 陳健宏 碩士 國立交通大學 資訊工程研究所 83 A speaker-dependent speech recognition system performs high recognition rate, but it needs a lot of speaker-specific training data. A speaker-independent (or multi-speaker) system needs no training data from speakers, and it cannot get satis- -factory performance usually. A speaker-adaptive system uses the existing knowledge from a reliably trained reference system, so that a small amount of new speaker's training data is suffi- cient to reach the performance of speaker-dependent system. In this thesis, we consider the applying of speaker adaptation techniques in Mandarin speech. The vocabulary we study has 76 syllables, which include 19 INITIALs and 4 FINALs from the confusing sets in Mandarin syllables. For the reference systems in speaker adaptation, we create speaker-dependent and speaker- independent systems based on the semi-continuous density hidden Markov model (SCHMM). The speaker-dependent system has an aver- -age recognition rate 90.46% and the speaker-independent system 58.97%. On the basis of the two reference systems, we study the Bayesian adaptation techniques with the forward-backward training procedure. We apply the adaptation techniques to adjust codebooks, mixture weights, and transition probabilities in SCHMM. Experiment results show that the adaptation procedure achieves better performance than that of the speaker-independent system with only one training token, it raises recognition rate from 58.97% to 76.65 %. When 3 training tokens are used, the recognition rate approximates that of the speaker-dependent system. When using 6 training tokens, the recognition rate achieves better than that of the speaker-dependent system. Chi-Min Liu 劉啟民 1995 學位論文 ; thesis 70 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 資訊工程研究所 === 83 === A speaker-dependent speech recognition system performs high
recognition rate, but it needs a lot of speaker-specific
training data. A speaker-independent (or multi-speaker) system
needs no training data from speakers, and it cannot get satis-
-factory performance usually. A speaker-adaptive system uses
the existing knowledge from a reliably trained reference
system, so that a small amount of new speaker's training data
is suffi- cient to reach the performance of speaker-dependent
system. In this thesis, we consider the applying of
speaker adaptation techniques in Mandarin speech. The
vocabulary we study has 76 syllables, which include 19
INITIALs and 4 FINALs from the confusing sets in Mandarin
syllables. For the reference systems in speaker adaptation, we
create speaker-dependent and speaker- independent systems based
on the semi-continuous density hidden Markov model (SCHMM). The
speaker-dependent system has an aver- -age recognition rate
90.46% and the speaker-independent system 58.97%. On the basis
of the two reference systems, we study the Bayesian
adaptation techniques with the forward-backward training
procedure. We apply the adaptation techniques to adjust
codebooks, mixture weights, and transition
probabilities in SCHMM. Experiment results show that the
adaptation procedure achieves better performance than that of
the speaker-independent system with only one training token, it
raises recognition rate from 58.97% to 76.65 %. When 3
training tokens are used, the recognition rate approximates
that of the speaker-dependent system. When using 6
training tokens, the recognition rate achieves better than
that of the speaker-dependent system.
|
author2 |
Chi-Min Liu |
author_facet |
Chi-Min Liu Chien-Hung Chen 陳健宏 |
author |
Chien-Hung Chen 陳健宏 |
spellingShingle |
Chien-Hung Chen 陳健宏 Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM |
author_sort |
Chien-Hung Chen |
title |
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM |
title_short |
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM |
title_full |
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM |
title_fullStr |
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM |
title_full_unstemmed |
Speaker Adaptation for Mandarin Syllable Recognition Based on Semi-Continuous Density HMM |
title_sort |
speaker adaptation for mandarin syllable recognition based on semi-continuous density hmm |
publishDate |
1995 |
url |
http://ndltd.ncl.edu.tw/handle/66731320553819607466 |
work_keys_str_mv |
AT chienhungchen speakeradaptationformandarinsyllablerecognitionbasedonsemicontinuousdensityhmm AT chénjiànhóng speakeradaptationformandarinsyllablerecognitionbasedonsemicontinuousdensityhmm AT chienhungchen lìyòngbànliánxùxíngyǐncángshìmǎkěfūmóxíngjiànlìdeyǔzhědiàoshìzhīzhōngwényǔyīnbiànshíxìtǒng AT chénjiànhóng lìyòngbànliánxùxíngyǐncángshìmǎkěfūmóxíngjiànlìdeyǔzhědiàoshìzhīzhōngwényǔyīnbiànshíxìtǒng |
_version_ |
1716868648158625792 |