Summary: | 碩士 === 國立清華大學 === 資訊工程學系 === 94 === A corpus-based TTS system is likely to have degradation in naturalness due to the acoustic mismatch of between selected synthesis units. Moreover, the collection of the speech corpus is also a labor-intensive task. Therefore, we have developed a carrier-sentence-based TTS system for Mandarin Chinese. Our lab is consistently trying to improve the TTS system such that a balance can be achieved considering synthesis speed, corpus size, and naturalness of the output utterances.
In this thesis, several methods that generate the prosodic parameters of a Mandarin TTS system are investigated. These methods include linear regression, the artificial neural network, and the regression model of support vector machine (SVM). We compare the RMSE of both inside and outside tests of these methods to find out the best regression model for prosody generation, and carry out a listening test. The neural network and SVM can achieve better performance in terms of RMSE. We have also performed additional optimization on the parameters of SVM.
Listening test shows that after our prosody modification, the TTS system indeed generates more natural-sounding utterances.
|