Optimization of an Image-Based Talking Head System
This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related informat...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2009-01-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Online Access: | http://dx.doi.org/10.1155/2009/174192 |
id |
doaj-28f2d7e73a644c099334d596f7b0c129 |
---|---|
record_format |
Article |
spelling |
doaj-28f2d7e73a644c099334d596f7b0c1292020-11-25T01:30:44ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-01200910.1155/2009/174192Optimization of an Image-Based Talking Head SystemKang LiuJoern OstermannThis paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos. http://dx.doi.org/10.1155/2009/174192 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kang Liu Joern Ostermann |
spellingShingle |
Kang Liu Joern Ostermann Optimization of an Image-Based Talking Head System EURASIP Journal on Audio, Speech, and Music Processing |
author_facet |
Kang Liu Joern Ostermann |
author_sort |
Kang Liu |
title |
Optimization of an Image-Based Talking Head System |
title_short |
Optimization of an Image-Based Talking Head System |
title_full |
Optimization of an Image-Based Talking Head System |
title_fullStr |
Optimization of an Image-Based Talking Head System |
title_full_unstemmed |
Optimization of an Image-Based Talking Head System |
title_sort |
optimization of an image-based talking head system |
publisher |
SpringerOpen |
series |
EURASIP Journal on Audio, Speech, and Music Processing |
issn |
1687-4714 1687-4722 |
publishDate |
2009-01-01 |
description |
This paper presents an image-based talking head system, which includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates natural looking facial animations from phonetic transcripts of text. A critical issue of the synthesis is the unit selection which selects and concatenates these appropriate mouth images from the database such that they match the spoken words of the talking head. Selection is based on lip synchronization and the similarity of consecutive images. The unit selection is refined in this paper, and Pareto optimization is used to train the unit selection. Experimental results of subjective tests show that most people cannot distinguish our facial animations from real videos. |
url |
http://dx.doi.org/10.1155/2009/174192 |
work_keys_str_mv |
AT kangliu optimizationofanimagebasedtalkingheadsystem AT joernostermann optimizationofanimagebasedtalkingheadsystem |
_version_ |
1725090297366446080 |