Trainable videorealistic speech animation

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. === Includes bibliographical references (p. 53-58). === I describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject...

Full description

Bibliographic Details
Main Author:	Ezzat, Tony F. (Tony Farid)
Other Authors:	Tomaso Poggio.
Format:	Others
Language:	English
Published:	Massachusetts Institute of Technology 2005
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/8020

id	ndltd-MIT-oai-dspace.mit.edu-1721.1-8020
record_format	oai_dc
spelling	ndltd-MIT-oai-dspace.mit.edu-1721.1-80202019-05-02T16:05:33Z Trainable videorealistic speech animation Ezzat, Tony F. (Tony Farid) Tomaso Poggio. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. Includes bibliographical references (p. 53-58). I describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two key contributions of this work are * a variant of the multidimensional morphable model (MMM) [4] [26] [25] to synthesize new, previously unseen mouth configurations from a small set of mouth image prototypes, * a trajectory synthesis technique based on regularization, which is automatically trained from the recorded video corpus, and which is capable of synthesizing trajectories in MMM space corresponding to any desired utterance. Results are presented on a series of numerical and psychophysical experiments designed to evaluate the synthetic animations. by Tony Farid Ezzat. Ph.D. 2005-08-24T22:07:30Z 2005-08-24T22:07:30Z 2002 2002 Thesis http://hdl.handle.net/1721.1/8020 52293159 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 58 p. 3669478 bytes 3669239 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Electrical Engineering and Computer Science.
spellingShingle	Electrical Engineering and Computer Science. Ezzat, Tony F. (Tony Farid) Trainable videorealistic speech animation
description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. === Includes bibliographical references (p. 53-58). === I describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two key contributions of this work are * a variant of the multidimensional morphable model (MMM) [4] [26] [25] to synthesize new, previously unseen mouth configurations from a small set of mouth image prototypes, * a trajectory synthesis technique based on regularization, which is automatically trained from the recorded video corpus, and which is capable of synthesizing trajectories in MMM space corresponding to any desired utterance. Results are presented on a series of numerical and psychophysical experiments designed to evaluate the synthetic animations. === by Tony Farid Ezzat. === Ph.D.
author2	Tomaso Poggio.
author_facet	Tomaso Poggio. Ezzat, Tony F. (Tony Farid)
author	Ezzat, Tony F. (Tony Farid)
author_sort	Ezzat, Tony F. (Tony Farid)
title	Trainable videorealistic speech animation
title_short	Trainable videorealistic speech animation
title_full	Trainable videorealistic speech animation
title_fullStr	Trainable videorealistic speech animation
title_full_unstemmed	Trainable videorealistic speech animation
title_sort	trainable videorealistic speech animation
publisher	Massachusetts Institute of Technology
publishDate	2005
url	http://hdl.handle.net/1721.1/8020
work_keys_str_mv	AT ezzattonyftonyfarid trainablevideorealisticspeechanimation
_version_	1719034543698083840

Trainable videorealistic speech animation

Similar Items