Summary: | 碩士 === 國立臺灣科技大學 === 電機工程系 === 88 === Syllable duration and amplitude are two important prosodic parameters for Mandarin text-to-speech because they have much influence on the fluency and naturalness of the synthesized speech. In this thesis, a method based on vector quantization(VQ) and hidden Markov model(HMM) is used to model syllable duration and amplitude separately. For convenience, the two models for duration and amplitude are together called DA-HMM. In the training phase for DA-HMM, the durations and amplitudes of the syllables comprising each training sentence are normalized first. Then, the average duration and amplitude for each kind of syllable and syllable-final are computed from the normalized training syllables. According to these average values, the 410 kinds of syllables and 37 kinds of syllable-finals are classified respectively by using vector quantization. The VQ codes of adjacent syllables in a training sentence are then combined to form the observation syllable sequence for HMM training. In the synthesis phase, the information of word-boundary and breath-group from text-processing stage are used to arrange the state transition sequence for DA-HMM. Then, according to the assigned state and the encoded observation symbol, the duration and amplitude parameters of each syllable in a sentence to be synthesized can be look up from auxiliary parameters ,of DA-HMM, estimated in the training phase. To study the performance of DA-HMM, we have conducted several experiments. The results show that for inside test, the average prediction errors of a syllable’s duration and amplitude are 43 ms and 1.1dB respectively, and that for outside test, the average prediction errors of a syllable’s duration and amplitude are 22 ms and 2.2 dB respectively.
|