Optimization of an Image-Based Talking Head System

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related informat...

Full description

Bibliographic Details
Main Authors:	Kang Liu, Joern Ostermann
Format:	Article
Language:	English
Published:	SpringerOpen 2009-01-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Online Access:	http://dx.doi.org/10.1155/2009/174192

id	doaj-28f2d7e73a644c099334d596f7b0c129
record_format	Article
spelling	doaj-28f2d7e73a644c099334d596f7b0c1292020-11-25T01:30:44ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-01200910.1155/2009/174192Optimization of an Image-Based Talking Head SystemKang LiuJoern OstermannThis paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos. http://dx.doi.org/10.1155/2009/174192
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kang Liu Joern Ostermann
spellingShingle	Kang Liu Joern Ostermann Optimization of an Image-Based Talking Head System EURASIP Journal on Audio, Speech, and Music Processing
author_facet	Kang Liu Joern Ostermann
author_sort	Kang Liu
title	Optimization of an Image-Based Talking Head System
title_short	Optimization of an Image-Based Talking Head System
title_full	Optimization of an Image-Based Talking Head System
title_fullStr	Optimization of an Image-Based Talking Head System
title_full_unstemmed	Optimization of an Image-Based Talking Head System
title_sort	optimization of an image-based talking head system
publisher	SpringerOpen
series	EURASIP Journal on Audio, Speech, and Music Processing
issn	1687-4714 1687-4722
publishDate	2009-01-01
description	This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.
url	http://dx.doi.org/10.1155/2009/174192
work_keys_str_mv	AT kangliu optimizationofanimagebasedtalkingheadsystem AT joernostermann optimizationofanimagebasedtalkingheadsystem
_version_	1725090297366446080

Optimization of an Image-Based Talking Head System

Similar Items