Summary: | 碩士 === 國立交通大學 === 電信工程系所 === 97 === As the computation power and the memory capacity increase, the corpus-base speech synthesis system has become the best and most popular speech synthesis system. Based on the system, the linguistic features are first derived after the text is parsed, then some appropriate units are selected as candidates. Finally, the well-pronounced speech is synthesized by concatenating the best unit sequence by the synthesizer part of system. In the unit selection process, the smooth-less places of synthesized speech usually caused by choosing the units which have different context with target units, or because the coarticulation effecting influencing. In this paper, to solve these problems, we use MFCC features to construct syllable spectral model and labeling coarticulation state between syllables in Chinese corpus at the same time. In this model, we have considered the three kinds of affecting factors with syllable spectral: the basic syllable type of current syllable, the coarticulation affecting from previous and following syllable, we assume that these three factors are independent and additive. After well-training, the affecting factor patterns could have good performance in model learning, besides the updated coarticulation states have reasonable explanation by prosody features and linguistic features. This method can improve the performance of synthesized speech by apply to unit selection process of using TTS system.
|