A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. M...

Full description

Bibliographic Details
Main Authors: Jing Mi, H. Steven Colburn
Format: Article
Language:English
Published: SAGE Publishing 2016-09-01
Series:Trends in Hearing
Online Access:https://doi.org/10.1177/2331216516669919
id doaj-378f222dca78404c91dbfebc02178cc6
record_format Article
spelling doaj-378f222dca78404c91dbfebc02178cc62020-11-25T03:39:18ZengSAGE PublishingTrends in Hearing2331-21652016-09-012010.1177/233121651666991910.1177_2331216516669919A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker EnvironmentsJing Mi0H. Steven Colburn1Boston University, Boston, MA, USABoston University, Boston, MA, USASpatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.https://doi.org/10.1177/2331216516669919
collection DOAJ
language English
format Article
sources DOAJ
author Jing Mi
H. Steven Colburn
spellingShingle Jing Mi
H. Steven Colburn
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
Trends in Hearing
author_facet Jing Mi
H. Steven Colburn
author_sort Jing Mi
title A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_short A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_full A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_fullStr A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_full_unstemmed A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments
title_sort binaural grouping model for predicting speech intelligibility in multitalker environments
publisher SAGE Publishing
series Trends in Hearing
issn 2331-2165
publishDate 2016-09-01
description Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.
url https://doi.org/10.1177/2331216516669919
work_keys_str_mv AT jingmi abinauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments
AT hstevencolburn abinauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments
AT jingmi binauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments
AT hstevencolburn binauralgroupingmodelforpredictingspeechintelligibilityinmultitalkerenvironments
_version_ 1724539707829780480