Synthesizing fundamental frequency using models automatically trained from data

The primary goal of this research is to produce stochastic models which can be used to generate fundamental frequency contours for synthetic utterances. The models produced are binary decision trees which are used to predict a parameterized description of fundamental frequency for an utterance. Thes...

Full description

Bibliographic Details
Main Author: Dusterhoff, K. E.
Published: University of Edinburgh 2000
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.649825
Description
Summary:The primary goal of this research is to produce stochastic models which can be used to generate fundamental frequency contours for synthetic utterances. The models produced are binary decision trees which are used to predict a parameterized description of fundamental frequency for an utterance. These models are trained using the sort of information which is typically available to a speech synthesizer during intonation generation. For example, the speech database is annotated with information about the location of word, phrase, segment, and syllable boundaries. The decision trees ask questions about such information. One obvious problem facing the stochastic modelling approach to intonation synthesis models is obtaining data with the appropriate intonation annotation. This thesis presents a method by which such an annotation can be automatically derived for an utterance. The method uses Hidden Markov Models to label speech with intonation event boundaries given fundamental frequency, energy, and Mel frequency cepstral coefficients. Intonation events are fundamental frequency movements which relate to constituents larger than the syllable nucleus. Even if there is an abundance of fully labelled speech data, and the intonation synthesis models appear robust, it is important to produce an evaluation of the resulting intonation contours which allows comparison with other intonation synthesis methods. Such an evaluation could be used to compare versions of the same basic methodology or completely different methodologies. The question of intonation evaluation is addressed in this thesis in terms of system development. Objective methods of evaluating intonation contours are investigated and reviewed with regard to their ability to regularly provide feedback which can be used to improve the systems being evaluated. The fourth area investigated in this thesis is the interaction between segmental (phone) and suprasegmental (intonation) levels of speech. This investigation is not undertaken separately from the other investigations. Questions about phone-intonation interaction form a part of the research in both intonation synthesis and intonation analysis.