Intelligibility model optimisation approaches for speech pre-enhancement

The goal of improving the intelligibility of broadcast speech is being met by a recent new direction in speech enhancement: near-end intelligibility enhancement. In contrast to the conventional speech enhancement approach that processes the corrupted speech at the receiver-side of the communication...

Full description

Bibliographic Details
Main Author:	Al Dabel, Maryam
Other Authors:	Barker, Jon
Published:	University of Sheffield 2016
Subjects:	006.3
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.701746

id	ndltd-bl.uk-oai-ethos.bl.uk-701746
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-7017462018-06-06T15:29:54ZIntelligibility model optimisation approaches for speech pre-enhancementAl Dabel, MaryamBarker, Jon2016The goal of improving the intelligibility of broadcast speech is being met by a recent new direction in speech enhancement: near-end intelligibility enhancement. In contrast to the conventional speech enhancement approach that processes the corrupted speech at the receiver-side of the communication chain, the near-end intelligibility enhancement approach pre-processes the clean speech at the transmitter-side, i.e. before it is played into the environmental noise. In this work, we describe an optimisation-based approach to near-end intelligibility enhancement using models of speech intelligibility to improve the intelligibility of speech in noise. This thesis first presents a survey of speech intelligibility models and how the adverse acoustic conditions affect the intelligibility of speech. The purpose of this survey is to identify models that we can adopt in the design of the pre-enhancement system. Then, we investigate the strategies humans use to increase speech intelligibility in noise. We then relate human strategies to existing algorithms for near-end intelligibility enhancement. A closed-loop feedback approach to near-end intelligibility enhancement is then introduced. In this framework, speech modifications are guided by a model of intelligibility. For the closed-loop system to work, we develop a simple spectral modification strategy that modifies the first few coefficients of an auditory cepstral representation such as to maximise an intelligibility measure. We experiment with two contrasting measures of objective intelligibility. The first, as a baseline, is an audibility measure named 'glimpse proportion' that is computed as the proportion of the spectro-temporal representation of the speech signal that is free from masking. We then propose a discriminative intelligibility model, building on the principles of missing data speech recognition, to model the likelihood of specific phonetic confusions that may occur when speech is presented in noise. The discriminative intelligibility measure is computed using a statistical model of speech from the speaker that is to be enhanced. Interim results showed that, unlike the glimpse proportion based system, the discriminative based system did not improve intelligibility. We investigated the reason behind that and we found that the discriminative based system was not able to target the phonetic confusion with the fixed spectral shaping. To address that, we introduce a time-varying spectral modification. We also propose to perform the optimisation on a segment-by-segment basis which enables a robust solution against the fluctuating noise. We further combine our system with a noise-independent enhancement technique, i.e. dynamic range compression. We found significant improvement in non-stationary noise condition, but no significant differences to the state-of-the art system (spectral shaping and dynamic range compression) where found in stationary noise condition.006.3University of Sheffieldhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.701746http://etheses.whiterose.ac.uk/15830/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	006.3
spellingShingle	006.3 Al Dabel, Maryam Intelligibility model optimisation approaches for speech pre-enhancement
description	The goal of improving the intelligibility of broadcast speech is being met by a recent new direction in speech enhancement: near-end intelligibility enhancement. In contrast to the conventional speech enhancement approach that processes the corrupted speech at the receiver-side of the communication chain, the near-end intelligibility enhancement approach pre-processes the clean speech at the transmitter-side, i.e. before it is played into the environmental noise. In this work, we describe an optimisation-based approach to near-end intelligibility enhancement using models of speech intelligibility to improve the intelligibility of speech in noise. This thesis first presents a survey of speech intelligibility models and how the adverse acoustic conditions affect the intelligibility of speech. The purpose of this survey is to identify models that we can adopt in the design of the pre-enhancement system. Then, we investigate the strategies humans use to increase speech intelligibility in noise. We then relate human strategies to existing algorithms for near-end intelligibility enhancement. A closed-loop feedback approach to near-end intelligibility enhancement is then introduced. In this framework, speech modifications are guided by a model of intelligibility. For the closed-loop system to work, we develop a simple spectral modification strategy that modifies the first few coefficients of an auditory cepstral representation such as to maximise an intelligibility measure. We experiment with two contrasting measures of objective intelligibility. The first, as a baseline, is an audibility measure named 'glimpse proportion' that is computed as the proportion of the spectro-temporal representation of the speech signal that is free from masking. We then propose a discriminative intelligibility model, building on the principles of missing data speech recognition, to model the likelihood of specific phonetic confusions that may occur when speech is presented in noise. The discriminative intelligibility measure is computed using a statistical model of speech from the speaker that is to be enhanced. Interim results showed that, unlike the glimpse proportion based system, the discriminative based system did not improve intelligibility. We investigated the reason behind that and we found that the discriminative based system was not able to target the phonetic confusion with the fixed spectral shaping. To address that, we introduce a time-varying spectral modification. We also propose to perform the optimisation on a segment-by-segment basis which enables a robust solution against the fluctuating noise. We further combine our system with a noise-independent enhancement technique, i.e. dynamic range compression. We found significant improvement in non-stationary noise condition, but no significant differences to the state-of-the art system (spectral shaping and dynamic range compression) where found in stationary noise condition.
author2	Barker, Jon
author_facet	Barker, Jon Al Dabel, Maryam
author	Al Dabel, Maryam
author_sort	Al Dabel, Maryam
title	Intelligibility model optimisation approaches for speech pre-enhancement
title_short	Intelligibility model optimisation approaches for speech pre-enhancement
title_full	Intelligibility model optimisation approaches for speech pre-enhancement
title_fullStr	Intelligibility model optimisation approaches for speech pre-enhancement
title_full_unstemmed	Intelligibility model optimisation approaches for speech pre-enhancement
title_sort	intelligibility model optimisation approaches for speech pre-enhancement
publisher	University of Sheffield
publishDate	2016
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.701746
work_keys_str_mv	AT aldabelmaryam intelligibilitymodeloptimisationapproachesforspeechpreenhancement
_version_	1718692249455296512

Intelligibility model optimisation approaches for speech pre-enhancement

Similar Items