Auditory-based algorithms for sound segregation in multisource and reverberant environments

Bibliographic Details
Main Author:	Roman, Nicoleta
Language:	English
Published:	The Ohio State University / OhioLINK 2005
Subjects:	computational auditory scene analysis (CASA) binaural speech segregation monaural speech segregation robust automatic speech segregation adaptive filtering room impulse response reverberation
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=osu1124370749

id	ndltd-OhioLink-oai-etd.ohiolink.edu-osu1124370749
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-osu11243707492021-08-03T05:50:17Z Auditory-based algorithms for sound segregation in multisource and reverberant environments Roman, Nicoleta computational auditory scene analysis (CASA) binaural speech segregation monaural speech segregation robust automatic speech segregation adaptive filtering room impulse response reverberation At a cocktail party, we can selectively attend to a single voice and filter out other interferences. This perceptual ability has motivated a new field of study known as computational auditory scene analysis (CASA) which aims to build speech separation systems that incorporate auditory principles. The psychological process of figure-ground segregation suggests that the target signal should be segregated as foreground while the remaining stimuli are treated as background. Accordingly, the computational goal of CASA should be to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. This dissertation investigates four aspects of CASA processing: location-based speech segregation, binaural tracking of multiple moving sources, binaural sound segregation in reverberation, and monaural segregation of reverberant speech. For localization, the auditory system utilizes the interaural time difference (ITD) and interaural intensity difference (IID) between the ears. We observe that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for ITD and IID resulting in a characteristic clustering. Consequently, we propose a supervised learning approach to estimate the ideal binary mask. A systematic evaluation shows that the resulting system produces masks very close to the ideal binary ones and large speech intelligibility improvements. In realistic environments, source motion requires consideration. Binaural cues are strongly correlated with locations in T-F units dominated by one source resulting in channel-dependent conditional probabilities. Consequently, we propose a multi-channel integration method of these probabilities in order to compute the likelihood function in a target space. Finally, a hidden Markov model is employed for forming continuous tracks and automatically detecting the number of active sources. Reverberation affects the ITD and IID cues. We therefore propose a binaural segregation system that combines target cancellation through adaptive filtering and a binary decision rule to estimate the ideal binary mask. A major advantage of the proposed system is that it imposes no restrictions on the interfering sources. Quantitative evaluations show that our system outperforms related beamforming approaches. Psychoacoustic evidence suggests that monaural processing play a vital role in segregation. It is known that reverberation smears the harmonicity of speech signals. We therefore propose a two-stage separation system that combines inverse filtering of target room impulse response with pitch-based segregation. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other locations are further smeared, and this leads to improved segregation and considerable signal-to-noise ratio gains. 2005-08-24 English text The Ohio State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=osu1124370749 http://rave.ohiolink.edu/etdc/view?acc_num=osu1124370749 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	computational auditory scene analysis (CASA) binaural speech segregation monaural speech segregation robust automatic speech segregation adaptive filtering room impulse response reverberation
spellingShingle	computational auditory scene analysis (CASA) binaural speech segregation monaural speech segregation robust automatic speech segregation adaptive filtering room impulse response reverberation Roman, Nicoleta Auditory-based algorithms for sound segregation in multisource and reverberant environments
author	Roman, Nicoleta
author_facet	Roman, Nicoleta
author_sort	Roman, Nicoleta
title	Auditory-based algorithms for sound segregation in multisource and reverberant environments
title_short	Auditory-based algorithms for sound segregation in multisource and reverberant environments
title_full	Auditory-based algorithms for sound segregation in multisource and reverberant environments
title_fullStr	Auditory-based algorithms for sound segregation in multisource and reverberant environments
title_full_unstemmed	Auditory-based algorithms for sound segregation in multisource and reverberant environments
title_sort	auditory-based algorithms for sound segregation in multisource and reverberant environments
publisher	The Ohio State University / OhioLINK
publishDate	2005
url	http://rave.ohiolink.edu/etdc/view?acc_num=osu1124370749
work_keys_str_mv	AT romannicoleta auditorybasedalgorithmsforsoundsegregationinmultisourceandreverberantenvironments
_version_	1719426338168766464

Auditory-based algorithms for sound segregation in multisource and reverberant environments

Similar Items