Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli

From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes),...

Full description

Bibliographic Details
Main Authors:	Benjamin T. Files, Bosco eTjan, Jintao eJiang, Lynne E Bernstein
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2015-07-01
Series:	Frontiers in Psychology
Subjects:	Speech Perception Visual Perception Discrimination multisensory perception multisensory processing Lipreading
Online Access:	http://journal.frontiersin.org/Journal/10.3389/fpsyg.2015.00878/full

id	doaj-9d49343455e14be0a5e22dbb7fd9206c
record_format	Article
spelling	doaj-9d49343455e14be0a5e22dbb7fd9206c2020-11-24T22:47:51ZengFrontiers Media S.A.Frontiers in Psychology1664-10782015-07-01610.3389/fpsyg.2015.00878134483Visual Speech Discrimination and Identification of Natural and Synthetic Consonant StimuliBenjamin T. Files0Bosco eTjan1Jintao eJiang2Lynne E Bernstein3Army Research LaboratoryUniversity of Southern CaliforniaApplications Technology (AppTek)George Washington UniversityFrom phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d’) and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel (CV) spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d’) increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d’ with inverted stimuli but a persistent pattern of larger d’ for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual speech synthesis.http://journal.frontiersin.org/Journal/10.3389/fpsyg.2015.00878/fullSpeech PerceptionVisual PerceptionDiscriminationmultisensory perceptionmultisensory processingLipreading
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Benjamin T. Files Bosco eTjan Jintao eJiang Lynne E Bernstein
spellingShingle	Benjamin T. Files Bosco eTjan Jintao eJiang Lynne E Bernstein Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli Frontiers in Psychology Speech Perception Visual Perception Discrimination multisensory perception multisensory processing Lipreading
author_facet	Benjamin T. Files Bosco eTjan Jintao eJiang Lynne E Bernstein
author_sort	Benjamin T. Files
title	Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli
title_short	Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli
title_full	Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli
title_fullStr	Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli
title_full_unstemmed	Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli
title_sort	visual speech discrimination and identification of natural and synthetic consonant stimuli
publisher	Frontiers Media S.A.
series	Frontiers in Psychology
issn	1664-1078
publishDate	2015-07-01
description	From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d’) and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel (CV) spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d’) increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d’ with inverted stimuli but a persistent pattern of larger d’ for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual speech synthesis.
topic	Speech Perception Visual Perception Discrimination multisensory perception multisensory processing Lipreading
url	http://journal.frontiersin.org/Journal/10.3389/fpsyg.2015.00878/full
work_keys_str_mv	AT benjamintfiles visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli AT boscoetjan visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli AT jintaoejiang visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli AT lynneebernstein visualspeechdiscriminationandidentificationofnaturalandsyntheticconsonantstimuli
_version_	1725680844896468992

Visual Speech Discrimination and Identification of Natural and Synthetic Consonant Stimuli

Similar Items