Hidden Markov Models for Audio to Visual Mapping

碩士 === 國立東華大學 === 資訊工程學系 === 93 === Hidden Markov Models (HMMs) is a powerful statistics-probability-model. Recently, the applications are speech and pattern recognition, and some researches develop Audio-Visual Speech Recognized System (AV-ASR). On the other hand, We are talking about using HMMs ma...

Full description

Bibliographic Details
Main Authors: Guang-Yi Wang, 王光一
Other Authors: Mau-Tsuen Yang
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/30753519038304625119
id ndltd-TW-093NDHU5392065
record_format oai_dc
spelling ndltd-TW-093NDHU53920652016-06-06T04:11:19Z http://ndltd.ncl.edu.tw/handle/30753519038304625119 Hidden Markov Models for Audio to Visual Mapping 隱藏馬可夫模型在音訊到視訊特徵對應之應用 Guang-Yi Wang 王光一 碩士 國立東華大學 資訊工程學系 93 Hidden Markov Models (HMMs) is a powerful statistics-probability-model. Recently, the applications are speech and pattern recognition, and some researches develop Audio-Visual Speech Recognized System (AV-ASR). On the other hand, We are talking about using HMMs mapping between different kinds of signal. In this thesis, we know how to translate audio signals to FAPs and how to adjust parameters of model for different languages and talking styles via Virtual Talking Head. There are three parts in my system, signal processing, training of model, and synthesis. First, in signal processing, the Mel-scale Frequency Cepstral Coefficients (MFCC) and the Facial Animation Parameter (FAP) are used to catch feature vectors form audio and video. Second, in training, we are discussing both parameter of HMMs and parameter of Gaussian Mixture Model (GMM). Finally, it will be put in Facial Animation Engine (FAE) to create new video after we have got the parameters of audio corresponding to FAP. In experiment, we wish the talking head could not only imitate talking and singing, but also simulate many language talking modules. This paper can apply to E-Learning, Online guide, real-time virtual conference and so on. Mau-Tsuen Yang 楊茂村 2005 學位論文 ; thesis 55 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立東華大學 === 資訊工程學系 === 93 === Hidden Markov Models (HMMs) is a powerful statistics-probability-model. Recently, the applications are speech and pattern recognition, and some researches develop Audio-Visual Speech Recognized System (AV-ASR). On the other hand, We are talking about using HMMs mapping between different kinds of signal. In this thesis, we know how to translate audio signals to FAPs and how to adjust parameters of model for different languages and talking styles via Virtual Talking Head. There are three parts in my system, signal processing, training of model, and synthesis. First, in signal processing, the Mel-scale Frequency Cepstral Coefficients (MFCC) and the Facial Animation Parameter (FAP) are used to catch feature vectors form audio and video. Second, in training, we are discussing both parameter of HMMs and parameter of Gaussian Mixture Model (GMM). Finally, it will be put in Facial Animation Engine (FAE) to create new video after we have got the parameters of audio corresponding to FAP. In experiment, we wish the talking head could not only imitate talking and singing, but also simulate many language talking modules. This paper can apply to E-Learning, Online guide, real-time virtual conference and so on.
author2 Mau-Tsuen Yang
author_facet Mau-Tsuen Yang
Guang-Yi Wang
王光一
author Guang-Yi Wang
王光一
spellingShingle Guang-Yi Wang
王光一
Hidden Markov Models for Audio to Visual Mapping
author_sort Guang-Yi Wang
title Hidden Markov Models for Audio to Visual Mapping
title_short Hidden Markov Models for Audio to Visual Mapping
title_full Hidden Markov Models for Audio to Visual Mapping
title_fullStr Hidden Markov Models for Audio to Visual Mapping
title_full_unstemmed Hidden Markov Models for Audio to Visual Mapping
title_sort hidden markov models for audio to visual mapping
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/30753519038304625119
work_keys_str_mv AT guangyiwang hiddenmarkovmodelsforaudiotovisualmapping
AT wángguāngyī hiddenmarkovmodelsforaudiotovisualmapping
AT guangyiwang yǐncángmǎkěfūmóxíngzàiyīnxùndàoshìxùntèzhēngduìyīngzhīyīngyòng
AT wángguāngyī yǐncángmǎkěfūmóxíngzàiyīnxùndàoshìxùntèzhēngduìyīngzhīyīngyòng
_version_ 1718295944109228032