Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching
碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that co...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2015
|
Online Access: | http://ndltd.ncl.edu.tw/handle/13473931426754293898 |
id |
ndltd-TW-103NTUS5392014 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NTUS53920142016-11-06T04:19:27Z http://ndltd.ncl.edu.tw/handle/13473931426754293898 Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching 使用MGE訓練之HMM模型及全域變異數匹配之合成語音信號音質改進方法 Wei-hsiang Hong 洪尉翔 碩士 國立臺灣科技大學 資訊工程系 103 In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that combines minimum generation error (MGE) based HMM training with formant enhancement or global variance matching to alleviate the problem of spectral over-smoothing, which can improve the signal quality of synthetic speech. When implementing MGE based HMM training, we program two different procedures called formula-simplification procedure and dimension-independence procedure, respectively. According to the results of measuring generation error, the dimension-independence procedure is found to be the better one. In practice, MGE based HMM training has three implementation factors that need to be considered. Therefore, we compare different combinations of the implementation factors in terms of objective measures (average MFCC distance and variance ratio). It is found that keeping covariance matrix unchanged and using initial HMM trained with segmental K-mean method is the better choice. According to the measured average MFCC distances, the ensemble-training flow is found to be better than the incremental training flow studied here. Nevertheless, when the measured variance ratios are considered, the incremental training flow will be the better one. As to formant enhancement, by comparing the spectral envelopes obtained with different methods, we found that the geometric-series method proposed here is better than the constant-series method. As to global variance matching, it is found that an appropriate weight value must be set to prevent abrupt amplitude change and click from occurring. According to the results of listening tests, among the speech synthesis methods using the MGE trained HMM, HMM trained with the incremental-training flow is better than with the ensemble-training flow. The results also show that global variance matching and formant enhancement can improve the signal quality of the synthetic speech basically. Nevertheless, clicks or harsh noises may sometimes be heard in the synthesized speech, which cause their MOS scores being decreased. Hung-yan Gu 古鴻炎 2015 學位論文 ; thesis 114 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === In this thesis, we adopt a new HMM (hidden Markov model) structure, i.e. half (half context-dependent and size) HMM, and the synthetic-speech fluency is apparently improved under the situation of limited training sentences. In addition, we study a method that combines minimum generation error (MGE) based HMM training with formant enhancement or global variance matching to alleviate the problem of spectral over-smoothing, which can improve the signal quality of synthetic speech. When implementing MGE based HMM training, we program two different procedures called formula-simplification procedure and dimension-independence procedure, respectively. According to the results of measuring generation error, the dimension-independence procedure is found to be the better one. In practice, MGE based HMM training has three implementation factors that need to be considered. Therefore, we compare different combinations of the implementation factors in terms of objective measures (average MFCC distance and variance ratio). It is found that keeping covariance matrix unchanged and using initial HMM trained with segmental K-mean method is the better choice. According to the measured average MFCC distances, the ensemble-training flow is found to be better than the incremental training flow studied here. Nevertheless, when the measured variance ratios are considered, the incremental training flow will be the better one. As to formant enhancement, by comparing the spectral envelopes obtained with different methods, we found that the geometric-series method proposed here is better than the constant-series method. As to global variance matching, it is found that an appropriate weight value must be set to prevent abrupt amplitude change and click from occurring. According to the results of listening tests, among the speech synthesis methods using the MGE trained HMM, HMM trained with the incremental-training flow is better than with the ensemble-training flow. The results also show that global variance matching and formant enhancement can improve the signal quality of the synthetic speech basically. Nevertheless, clicks or harsh noises may sometimes be heard in the synthesized speech, which cause their MOS scores being decreased.
|
author2 |
Hung-yan Gu |
author_facet |
Hung-yan Gu Wei-hsiang Hong 洪尉翔 |
author |
Wei-hsiang Hong 洪尉翔 |
spellingShingle |
Wei-hsiang Hong 洪尉翔 Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching |
author_sort |
Wei-hsiang Hong |
title |
Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching |
title_short |
Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching |
title_full |
Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching |
title_fullStr |
Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching |
title_full_unstemmed |
Synthetic Speech Signal-quality Improving Methods Using Minimum-Generation-Error Trained HMM and Global Variance Matching |
title_sort |
synthetic speech signal-quality improving methods using minimum-generation-error trained hmm and global variance matching |
publishDate |
2015 |
url |
http://ndltd.ncl.edu.tw/handle/13473931426754293898 |
work_keys_str_mv |
AT weihsianghong syntheticspeechsignalqualityimprovingmethodsusingminimumgenerationerrortrainedhmmandglobalvariancematching AT hóngwèixiáng syntheticspeechsignalqualityimprovingmethodsusingminimumgenerationerrortrainedhmmandglobalvariancematching AT weihsianghong shǐyòngmgexùnliànzhīhmmmóxíngjíquányùbiànyìshùpǐpèizhīhéchéngyǔyīnxìnhàoyīnzhìgǎijìnfāngfǎ AT hóngwèixiáng shǐyòngmgexùnliànzhīhmmmóxíngjíquányùbiànyìshùpǐpèizhīhéchéngyǔyīnxìnhàoyīnzhìgǎijìnfāngfǎ |
_version_ |
1718391503627223040 |