Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 99 === This thesis proposes the use of multiple acoustic models in order to improve Taiwanese pronunciation scoring. All pronunciation scoring used in this research is based on ranking of a phone model against its competing models so that the performance evaluation i...

Full description

Bibliographic Details
Main Authors:	Chen, Hung-Jui, 陳宏瑞
Other Authors:	Jang, Jyh-Shing Roger
Format:	Others
Language:	zh-TW
Published:	2011
Online Access:	http://ndltd.ncl.edu.tw/handle/36948546312262458178

id	ndltd-TW-099NTHU5394023
record_format	oai_dc
spelling	ndltd-TW-099NTHU53940232015-10-13T20:23:01Z http://ndltd.ncl.edu.tw/handle/36948546312262458178 Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models 使用多重聲學模型以改進台語語音評分 Chen, Hung-Jui 陳宏瑞碩士國立清華大學資訊系統與應用研究所 99 This thesis proposes the use of multiple acoustic models in order to improve Taiwanese pronunciation scoring. All pronunciation scoring used in this research is based on ranking of a phone model against its competing models so that the performance evaluation is also carried out based on ranking. Five training methods are used to generating acoustic models for different purposes. The first method trains acoustic models for the general purpose by using all training corpus. This type of acoustic models serves as out baseline system. The rest of the four training methods aim to solve different problems encountered during the course of pronunciation scoring. They are inaccurate forced alignment, variations in Taiwanese accents, unreliable pronunciation scoring of consonant phonemes, and insufficient training data for certain right-context dependent biphone models. First of all, since the forced alignment results on the beginning and the end of a sentence are usually inaccurate, we discard all sentence-beginning and sentence-end phoneme segments from the training data. We use the remaining training data to train our acoustic models. For the problem of variations in Taiwanese accents, we found that certain occurrances of "er" and "o" sounds in our training data were pronounced similarly and are easily confused in our speech recognition system. We attempt this problem by explicitly increasing the number of mixture components of their corresponding biphone models. For the problem of unreliable pronunciation scoring of consonant phonemes, consonants are usually short in duration and do not have a stable waveform so that they are usually more difficult to model. We tackle this problem by increasing the number of mixture components of all models. For the problem of insufficient training data for certain right-context dependent biphone models, we found that certain biphone instances are rarely seen in our training data, but their corresponding monophone instances are abundant. We therefore increase the number of training iterations on these monophone models before extending them into biphone models. The experimental result shows that the first three training methods can effectively improve the scoring performance while the last method has a light decrease in performance. However, we also found that acoustic models trained from each of the five training methods show satisfactory scoring performance to different set of phone models. We therefore propose a method that uses multiple acoustic models for pronunciation scoring. We look for the best phone model among the five the above-mentioned five types of acoustic models by running an inside test. We then carry out an outside test for scoring by using the corresponding phone models. The experimental result shows that the proposed method exhibits a better performance than any of the above five models. Key words: hidden Markov models, acoustic models, multiple acoustic models. Jang, Jyh-Shing Roger 張智星 2011 學位論文 ; thesis 41 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立清華大學 === 資訊系統與應用研究所 === 99 === This thesis proposes the use of multiple acoustic models in order to improve Taiwanese pronunciation scoring. All pronunciation scoring used in this research is based on ranking of a phone model against its competing models so that the performance evaluation is also carried out based on ranking. Five training methods are used to generating acoustic models for different purposes. The first method trains acoustic models for the general purpose by using all training corpus. This type of acoustic models serves as out baseline system. The rest of the four training methods aim to solve different problems encountered during the course of pronunciation scoring. They are inaccurate forced alignment, variations in Taiwanese accents, unreliable pronunciation scoring of consonant phonemes, and insufficient training data for certain right-context dependent biphone models. First of all, since the forced alignment results on the beginning and the end of a sentence are usually inaccurate, we discard all sentence-beginning and sentence-end phoneme segments from the training data. We use the remaining training data to train our acoustic models. For the problem of variations in Taiwanese accents, we found that certain occurrances of "er" and "o" sounds in our training data were pronounced similarly and are easily confused in our speech recognition system. We attempt this problem by explicitly increasing the number of mixture components of their corresponding biphone models. For the problem of unreliable pronunciation scoring of consonant phonemes, consonants are usually short in duration and do not have a stable waveform so that they are usually more difficult to model. We tackle this problem by increasing the number of mixture components of all models. For the problem of insufficient training data for certain right-context dependent biphone models, we found that certain biphone instances are rarely seen in our training data, but their corresponding monophone instances are abundant. We therefore increase the number of training iterations on these monophone models before extending them into biphone models. The experimental result shows that the first three training methods can effectively improve the scoring performance while the last method has a light decrease in performance. However, we also found that acoustic models trained from each of the five training methods show satisfactory scoring performance to different set of phone models. We therefore propose a method that uses multiple acoustic models for pronunciation scoring. We look for the best phone model among the five the above-mentioned five types of acoustic models by running an inside test. We then carry out an outside test for scoring by using the corresponding phone models. The experimental result shows that the proposed method exhibits a better performance than any of the above five models. Key words: hidden Markov models, acoustic models, multiple acoustic models.
author2	Jang, Jyh-Shing Roger
author_facet	Jang, Jyh-Shing Roger Chen, Hung-Jui 陳宏瑞
author	Chen, Hung-Jui 陳宏瑞
spellingShingle	Chen, Hung-Jui 陳宏瑞 Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
author_sort	Chen, Hung-Jui
title	Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
title_short	Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
title_full	Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
title_fullStr	Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
title_full_unstemmed	Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
title_sort	improving taiwanese pronunciation scoring via multiple acoustic models
publishDate	2011
url	http://ndltd.ncl.edu.tw/handle/36948546312262458178
work_keys_str_mv	AT chenhungjui improvingtaiwanesepronunciationscoringviamultipleacousticmodels AT chénhóngruì improvingtaiwanesepronunciationscoringviamultipleacousticmodels AT chenhungjui shǐyòngduōzhòngshēngxuémóxíngyǐgǎijìntáiyǔyǔyīnpíngfēn AT chénhóngruì shǐyòngduōzhòngshēngxuémóxíngyǐgǎijìntáiyǔyǔyīnpíngfēn
_version_	1718047472660512768

Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models

Similar Items