Normalization and Prediction of Syllable Initial and Final Durations for speech Synthesis

碩士 === 國立臺灣科技大學 === 資訊工程系 === 105 === In this thesis, normalization methods for syllable initial and final durations are studied. Also, a feature set is designed for Weka to construct classification and regression trees (CART) to predict the syllable initial and final durations of a text sentence to...

Full description

Bibliographic Details
Main Authors: LIOU-ZIH-YANG, 劉子揚
Other Authors: Hung-Yan Gu
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/49zkw8
Description
Summary:碩士 === 國立臺灣科技大學 === 資訊工程系 === 105 === In this thesis, normalization methods for syllable initial and final durations are studied. Also, a feature set is designed for Weka to construct classification and regression trees (CART) to predict the syllable initial and final durations of a text sentence to be synthesized. We hope to combine the two studies (duration normalization and duration prediction in terms of CART),to increase the naturalness level of the synthesized speech especially in the arrangement of initial an final durations. In the training stage, the original durations of syllable initial and final are obtained by reading the corresponding label file of a training sentence. Then, the method, two level standard deviation matching, proposed here is used to normalize the durations of syllable initials and finals. Next, the software, Weka, is used to construct two CART trees for the durations of syllable initials and finals respectively. In the synthesis stage, we develop program modules to predict the duration of a syllable initial or final according to the two CART constructed by Weka. Then these program modules are integrated to the speech synthesis system developed by predecessor researchers. Hence, the system can synthesize speech signals according to the duration normalization and prediction methods studied in this thesis. By using the synthesized speechs, we conduct two types of listening tests including naturalness level comparison and naturalness level MOS evaluation. According to the average scores obtained from the listening tests, naturalness level comparison, the duration prediction method studied here is indeed better than the method provided by predecessor researchers. This is because the arrangement of syllable initial and final durations by our method is more natural. In addition, according to the average scores obtained from the listening tests, naturalness level MOS evaluation, most participants agree that the synthetic speechs by using our duration prediction method are very close to the corresponding speechs uttered by a real speaker. In details, the average scores of our synthetic speechs are all greater than 3.5 points, and one of them is greater than 4 points. Therefore, the naturalness level of the synthetic speechs by using our duration normalization and prediction methods is very close to the speechs uttered by a real person.