A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. M...

Full description

Bibliographic Details
Main Authors:	Jing Mi, H. Steven Colburn
Format:	Article
Language:	English
Published:	SAGE Publishing 2016-09-01
Series:	Trends in Hearing
Online Access:	https://doi.org/10.1177/2331216516669919

id	doaj-378f222dca78404c91dbfebc02178cc6
record_format	Article
spelling	doaj-378f222dca78404c91dbfebc02178cc62020-11-25T03:39:18ZengSAGE PublishingTrends in Hearing2331-21652016-09-012010.1177/233121651666991910.1177_2331216516669919A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker EnvironmentsJing Mi0H. Steven Colburn1Boston University, Boston, MA, USABoston University, Boston, MA, USASpatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.https://doi.org/10.1177/2331216516669919
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jing Mi H. Steven Colburn
spellingShingle	Jing Mi H. Steven Colburn A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments Trends in Hearing
author_facet	Jing Mi H. Steven Colburn
author_sort	Jing Mi
title	A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_short	A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_full	A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_fullStr	A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_full_unstemmed	A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_sort	binaural grouping model for predicting speech intelligibility in multitalker environments
publisher	SAGE Publishing
series	Trends in Hearing
issn	2331-2165
publishDate	2016-09-01
description	Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.
url	https://doi.org/10.1177/2331216516669919
work_keys_str_mv	AT jingmi abinauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments AT hstevencolburn abinauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments AT jingmi binauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments AT hstevencolburn binauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments
_version_	1724539707829780480

A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

Similar Items