Summary: | 碩士 === 國立東華大學 === 資訊工程學系 === 93 === Hidden Markov Models (HMMs) is a powerful statistics-probability-model. Recently, the applications are speech and pattern recognition, and some researches develop Audio-Visual Speech Recognized System (AV-ASR). On the other hand,
We are talking about using HMMs mapping between different kinds of signal. In this thesis, we know how to translate audio signals to FAPs and how to adjust parameters of model for different languages and talking styles via Virtual Talking Head. There are three parts in my system, signal processing, training of model, and synthesis. First, in signal processing, the Mel-scale Frequency Cepstral Coefficients (MFCC) and the Facial Animation Parameter (FAP) are used to catch feature vectors form audio and video. Second, in training, we are discussing both parameter of HMMs and parameter of Gaussian Mixture Model (GMM). Finally, it will be put in Facial Animation Engine (FAE) to create new video after we have got the parameters of audio corresponding to FAP. In experiment, we wish the talking head could not only imitate talking and singing, but also simulate many language talking modules. This paper can apply to E-Learning, Online guide, real-time virtual conference and so on.
|