Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications

<p/> <p>We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme li...

Full description

Bibliographic Details
Main Authors:	Nakamura Satoshi, Morishima Shigeo
Format:	Article
Language:	English
Published:	SpringerOpen 2004-01-01
Series:	EURASIP Journal on Advances in Signal Processing
Subjects:	audio-visual speech translation lip-sync talking head face tracking with 3D template video mail and automatic dubbing texture-mapped facial animation personal face model
Online Access:	http://dx.doi.org/10.1155/S1110865704404259

id	doaj-20271bbe2b4a47daba2fac3712500db5
record_format	Article
spelling	doaj-20271bbe2b4a47daba2fac3712500db52020-11-25T00:27:52ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61721687-61802004-01-01200411509796Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing ApplicationsNakamura SatoshiMorishima Shigeo<p/> <p>We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker's face in an image sequence. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.</p>http://dx.doi.org/10.1155/S1110865704404259audio-visual speech translationlip-sync talking headface tracking with 3D templatevideo mail and automatic dubbingtexture-mapped facial animationpersonal face model
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Nakamura Satoshi Morishima Shigeo
spellingShingle	Nakamura Satoshi Morishima Shigeo Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications EURASIP Journal on Advances in Signal Processing audio-visual speech translation lip-sync talking head face tracking with 3D template video mail and automatic dubbing texture-mapped facial animation personal face model
author_facet	Nakamura Satoshi Morishima Shigeo
author_sort	Nakamura Satoshi
title	Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications
title_short	Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications
title_full	Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications
title_fullStr	Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications
title_full_unstemmed	Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications
title_sort	multimodal translation system using texture-mapped lip-sync images for video mail and automatic dubbing applications
publisher	SpringerOpen
series	EURASIP Journal on Advances in Signal Processing
issn	1687-6172 1687-6180
publishDate	2004-01-01
description	<p/> <p>We introduce a multimodal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion by synchronizing it to the translated speech. This system also introduces both a face synthesis technique that can generate any viseme lip shape and a face tracking technique that can estimate the original position and rotation of a speaker's face in an image sequence. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a 3D wire-frame model that is adaptable to any speaker. Our approach provides translated image synthesis with an extremely small database. The tracking motion of the face from a video image is performed by template matching. In this system, the translation and rotation of the face are detected by using a 3D personal face model whose texture is captured from a video frame. We also propose a method to customize the personal face model by using our GUI tool. By combining these techniques and the translated voice synthesis technique, an automatic multimodal translation can be achieved that is suitable for video mail or automatic dubbing systems into other languages.</p>
topic	audio-visual speech translation lip-sync talking head face tracking with 3D template video mail and automatic dubbing texture-mapped facial animation personal face model
url	http://dx.doi.org/10.1155/S1110865704404259
work_keys_str_mv	AT nakamurasatoshi multimodaltranslationsystemusingtexturemappedlipsyncimagesforvideomailandautomaticdubbingapplications AT morishimashigeo multimodaltranslationsystemusingtexturemappedlipsyncimagesforvideomailandautomaticdubbingapplications
_version_	1725337953585070080

Multimodal Translation System Using Texture-Mapped Lip-Sync Images for Video Mail and Automatic Dubbing Applications

Similar Items