Optimization of an Image-Based Talking Head System

This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related informat...

Full description

Bibliographic Details
Main Authors: Kang Liu, Joern Ostermann
Format: Article
Language:English
Published: SpringerOpen 2009-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Online Access:http://dx.doi.org/10.1155/2009/174192
id doaj-28f2d7e73a644c099334d596f7b0c129
record_format Article
spelling doaj-28f2d7e73a644c099334d596f7b0c1292020-11-25T01:30:44ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-01200910.1155/2009/174192Optimization of an Image-Based Talking Head SystemKang LiuJoern OstermannThis paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos. http://dx.doi.org/10.1155/2009/174192
collection DOAJ
language English
format Article
sources DOAJ
author Kang Liu
Joern Ostermann
spellingShingle Kang Liu
Joern Ostermann
Optimization of an Image-Based Talking Head System
EURASIP Journal on Audio, Speech, and Music Processing
author_facet Kang Liu
Joern Ostermann
author_sort Kang Liu
title Optimization of an Image-Based Talking Head System
title_short Optimization of an Image-Based Talking Head System
title_full Optimization of an Image-Based Talking Head System
title_fullStr Optimization of an Image-Based Talking Head System
title_full_unstemmed Optimization of an Image-Based Talking Head System
title_sort optimization of an image-based talking head system
publisher SpringerOpen
series EURASIP Journal on Audio, Speech, and Music Processing
issn 1687-4714
1687-4722
publishDate 2009-01-01
description This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos.
url http://dx.doi.org/10.1155/2009/174192
work_keys_str_mv AT kangliu optimizationofanimagebasedtalkingheadsystem
AT joernostermann optimizationofanimagebasedtalkingheadsystem
_version_ 1725090297366446080