Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching

碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that co...

Full description

Bibliographic Details
Main Authors:	Wei-hsiang Hong, 洪尉翔
Other Authors:	Hung-yan Gu
Format:	Others
Language:	zh-TW
Published:	2015
Online Access:	http://ndltd.ncl.edu.tw/handle/13473931426754293898

id	ndltd-TW-103NTUS5392014
record_format	oai_dc
spelling	ndltd-TW-103NTUS53920142016-11-06T04:19:27Z http://ndltd.ncl.edu.tw/handle/13473931426754293898 Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching 使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法 Wei-hsiang Hong 洪尉翔碩士國立臺灣科技大學資訊工程系 103 In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that combines minimum generation error (MGE) based HMM training with formant enhancement or global variance matching to alleviate the problem of spectral over-smoothing, which can improve the signal quality of synthetic speech. When implementing MGE based HMM training, we program two different procedures called formula-simplification procedure and dimension-independence procedure, respectively. According to the results of measuring generation error, the dimension-independence procedure is found to be the better one. In practice, MGE based HMM training has three implementation factors that need to be considered. Therefore, we compare different combinations of the implementation factors in terms of objective measures (average MFCC distance and variance ratio). It is found that keeping covariance matrix unchanged and using initial HMM trained with segmental K-mean method is the better choice. According to the measured average MFCC distances, the ensemble-training flow is found to be better than the incremental training flow studied here. Nevertheless, when the measured variance ratios are considered, the incremental training flow will be the better one. As to formant enhancement, by comparing the spectral envelopes obtained with different methods, we found that the geometric-series method proposed here is better than the constant-series method. As to global variance matching, it is found that an appropriate weight value must be set to prevent abrupt amplitude change and click from occurring. According to the results of listening tests, among the speech synthesis methods using the MGE trained HMM, HMM trained with the incremental-training flow is better than with the ensemble-training flow. The results also show that global variance matching and formant enhancement can improve the signal quality of the synthetic speech basically. Nevertheless, clicks or harsh noises may sometimes be heard in the synthesized speech, which cause their MOS scores being decreased. Hung-yan Gu 古鴻炎 2015 學位論文 ; thesis 114 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that combines minimum generation error (MGE) based HMM training with formant enhancement or global variance matching to alleviate the problem of spectral over-smoothing, which can improve the signal quality of synthetic speech. When implementing MGE based HMM training, we program two different procedures called formula-simplification procedure and dimension-independence procedure, respectively. According to the results of measuring generation error, the dimension-independence procedure is found to be the better one. In practice, MGE based HMM training has three implementation factors that need to be considered. Therefore, we compare different combinations of the implementation factors in terms of objective measures (average MFCC distance and variance ratio). It is found that keeping covariance matrix unchanged and using initial HMM trained with segmental K-mean method is the better choice. According to the measured average MFCC distances, the ensemble-training flow is found to be better than the incremental training flow studied here. Nevertheless, when the measured variance ratios are considered, the incremental training flow will be the better one. As to formant enhancement, by comparing the spectral envelopes obtained with different methods, we found that the geometric-series method proposed here is better than the constant-series method. As to global variance matching, it is found that an appropriate weight value must be set to prevent abrupt amplitude change and click from occurring. According to the results of listening tests, among the speech synthesis methods using the MGE trained HMM, HMM trained with the incremental-training flow is better than with the ensemble-training flow. The results also show that global variance matching and formant enhancement can improve the signal quality of the synthetic speech basically. Nevertheless, clicks or harsh noises may sometimes be heard in the synthesized speech, which cause their MOS scores being decreased.
author2	Hung-yan Gu
author_facet	Hung-yan Gu Wei-hsiang Hong 洪尉翔
author	Wei-hsiang Hong 洪尉翔
spellingShingle	Wei-hsiang Hong 洪尉翔 Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
author_sort	Wei-hsiang Hong
title	Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_short	Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_full	Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_fullStr	Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_full_unstemmed	Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_sort	synthetic speech signal-quality improving methods using minimum-generation-error trained hmm and global variance matching
publishDate	2015
url	http://ndltd.ncl.edu.tw/handle/13473931426754293898
work_keys_str_mv	AT weihsianghong syntheticspeechsignalqualityimprovingmethodsusingminimumgenerationerrortrainedhmmandglobalvariancematching AT hóngwèixiáng syntheticspeechsignalqualityimprovingmethodsusingminimumgenerationerrortrainedhmmandglobalvariancematching AT weihsianghong shǐyòngmgexùnliànzhīhmmmóxíngjíquányùbiànyìshùpǐpèizhīhéchéngyǔyīnxìnhàoyīnzhìgǎijìnfāngfǎ AT hóngwèixiáng shǐyòngmgexùnliànzhīhmmmóxíngjíquányùbiànyìshùpǐpèizhīhéchéngyǔyīnxìnhàoyīnzhìgǎijìnfāngfǎ
_version_	1718391503627223040

Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching

Similar Items