Summary: | 碩士 === 國立交通大學 === 電機與控制工程系 === 88 === In this thesis, we investigate a new technique on a Mandarin text-to-speech (TTS) system. Our major effort is focused on prosodic information generation. New methodologies for constructing fuzzy rules on a prosodic model to simulate human’s brain are studied in this thesis. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multi-layer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a connectionist structure. The RFNN can be functionally partitioned into two parts. The first part adopts the SONFIN and is taken as a prosodic model to explore the relationship between high-level linguistic features and prosodic information by inferring fuzzy rule. The second part employs a five-layer network to generate all prosodic parameters by using the prosodic fuzzy rule from the first part and other important features of syllable fed in directly. Using the method proposed in our TTS system can overcome not only sandhi rule and the other prosodic phenomena existing in the traditional TTS systems but also to find out some rules about prosodic phrase structure. Hence, we can generate proper prosody parameters, including pitch means, pitch shapes, maximum energy levels, syllable duration and pause duration, to synthesis fluent speech. To verify the performance of this prosodic model, we modify a TTS system developed previously, based on time-domain pitch synchronous overlap add (TD-PSOLA) method, with a Mandarin monosyllable database. Through some listening test, the synthetic speech is more natural than previous version.
|