On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two inf...

Full description

Bibliographic Details
Main Authors:	Wesley Mattheyses, Lukas Latacz, Werner Verhelst
Format:	Article
Language:	English
Published:	SpringerOpen 2009-01-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Online Access:	http://dx.doi.org/10.1155/2009/169819

id	doaj-0aab8d121cb74b8398f5e854f7b74de2
record_format	Article
spelling	doaj-0aab8d121cb74b8398f5e854f7b74de22020-11-25T01:28:36ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47141687-47222009-01-01200910.1155/2009/169819On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual SpeechWesley MattheysesLukas LataczWerner VerhelstAudiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality. http://dx.doi.org/10.1155/2009/169819
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wesley Mattheyses Lukas Latacz Werner Verhelst
spellingShingle	Wesley Mattheyses Lukas Latacz Werner Verhelst On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech EURASIP Journal on Audio, Speech, and Music Processing
author_facet	Wesley Mattheyses Lukas Latacz Werner Verhelst
author_sort	Wesley Mattheyses
title	On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_short	On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_full	On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_fullStr	On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_full_unstemmed	On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
title_sort	on the importance of audiovisual coherence for the perceived quality of synthesized visual speech
publisher	SpringerOpen
series	EURASIP Journal on Audio, Speech, and Music Processing
issn	1687-4714 1687-4722
publishDate	2009-01-01
description	Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.
url	http://dx.doi.org/10.1155/2009/169819
work_keys_str_mv	AT wesleymattheyses ontheimportanceofaudiovisualcoherencefortheperceivedqualityofsynthesizedvisualspeech AT lukaslatacz ontheimportanceofaudiovisualcoherencefortheperceivedqualityofsynthesizedvisualspeech AT wernerverhelst ontheimportanceofaudiovisualcoherencefortheperceivedqualityofsynthesizedvisualspeech
_version_	1725100611119087616

On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Similar Items