Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching

碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that co...

Full description

Bibliographic Details
Main Authors: Wei-hsiang Hong, 洪尉翔
Other Authors: Hung-yan Gu
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/13473931426754293898
id ndltd-TW-103NTUS5392014
record_format oai_dc
spelling ndltd-TW-103NTUS53920142016-11-06T04:19:27Z http://ndltd.ncl.edu.tw/handle/13473931426754293898 Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching 使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法 Wei-hsiang Hong 洪尉翔 碩士 國立臺灣科技大學 資訊工程系 103 In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that combines minimum generation error (MGE) based HMM training with formant enhancement or global variance matching to alleviate the problem of spectral over-smoothing, which can improve the signal quality of synthetic speech. When implementing MGE based HMM training, we program two different procedures called formula-simplification procedure and dimension-independence procedure, respectively. According to the results of measuring generation error, the dimension-independence procedure is found to be the better one. In practice, MGE based HMM training has three implementation factors that need to be considered. Therefore, we compare different combinations of the implementation factors in terms of objective measures (average MFCC distance and variance ratio). It is found that keeping covariance matrix unchanged and using initial HMM trained with segmental K-mean method is the better choice. According to the measured average MFCC distances, the ensemble-training flow is found to be better than the incremental training flow studied here. Nevertheless, when the measured variance ratios are considered, the incremental training flow will be the better one. As to formant enhancement, by comparing the spectral envelopes obtained with different methods, we found that the geometric-series method proposed here is better than the constant-series method. As to global variance matching, it is found that an appropriate weight value must be set to prevent abrupt amplitude change and click from occurring. According to the results of listening tests, among the speech synthesis methods using the MGE trained HMM, HMM trained with the incremental-training flow is better than with the ensemble-training flow. The results also show that global variance matching and formant enhancement can improve the signal quality of the synthetic speech basically. Nevertheless, clicks or harsh noises may sometimes be heard in the synthesized speech, which cause their MOS scores being decreased. Hung-yan Gu 古鴻炎 2015 學位論文 ; thesis 114 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that combines minimum generation error (MGE) based HMM training with formant enhancement or global variance matching to alleviate the problem of spectral over-smoothing, which can improve the signal quality of synthetic speech. When implementing MGE based HMM training, we program two different procedures called formula-simplification procedure and dimension-independence procedure, respectively. According to the results of measuring generation error, the dimension-independence procedure is found to be the better one. In practice, MGE based HMM training has three implementation factors that need to be considered. Therefore, we compare different combinations of the implementation factors in terms of objective measures (average MFCC distance and variance ratio). It is found that keeping covariance matrix unchanged and using initial HMM trained with segmental K-mean method is the better choice. According to the measured average MFCC distances, the ensemble-training flow is found to be better than the incremental training flow studied here. Nevertheless, when the measured variance ratios are considered, the incremental training flow will be the better one. As to formant enhancement, by comparing the spectral envelopes obtained with different methods, we found that the geometric-series method proposed here is better than the constant-series method. As to global variance matching, it is found that an appropriate weight value must be set to prevent abrupt amplitude change and click from occurring. According to the results of listening tests, among the speech synthesis methods using the MGE trained HMM, HMM trained with the incremental-training flow is better than with the ensemble-training flow. The results also show that global variance matching and formant enhancement can improve the signal quality of the synthetic speech basically. Nevertheless, clicks or harsh noises may sometimes be heard in the synthesized speech, which cause their MOS scores being decreased.
author2 Hung-yan Gu
author_facet Hung-yan Gu
Wei-hsiang Hong
洪尉翔
author Wei-hsiang Hong
洪尉翔
spellingShingle Wei-hsiang Hong
洪尉翔
Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
author_sort Wei-hsiang Hong
title Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_short Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_full Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_fullStr Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_full_unstemmed Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
title_sort synthetic speech signal-quality improving methods using minimum-generation-error trained hmm and global variance matching
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/13473931426754293898
work_keys_str_mv AT weihsianghong syntheticspeechsignalqualityimprovingmethodsusingminimumgenerationerrortrainedhmmandglobalvariancematching
AT hóngwèixiáng syntheticspeechsignalqualityimprovingmethodsusingminimumgenerationerrortrainedhmmandglobalvariancematching
AT weihsianghong shǐyòngmgexùnliànzhīhmmmóxíngjíquányùbiànyìshùpǐpèizhīhéchéngyǔyīnxìnhàoyīnzhìgǎijìnfāngfǎ
AT hóngwèixiáng shǐyòngmgexùnliànzhīhmmmóxíngjíquányùbiànyìshùpǐpèizhīhéchéngyǔyīnxìnhàoyīnzhìgǎijìnfāngfǎ
_version_ 1718391503627223040