A Study on Robust Audio Processing: From Signal Enhancement to Model Learning

博士 === 國立中央大學 === 資訊工程學系 === 105 === Robustness against noise is a critical characteristic of an audio recognition (AR) system. To develop a robust AR system, this dissertation proposes two front-end processing methods. To suppress the effects of background noise on target sound, a speech enhancemen...

Full description

Bibliographic Details
Main Authors: Yuan-Shan Lee, 李遠山
Other Authors: Jia-Ching Wang
Format: Others
Language:en_US
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/9r5bcc
id ndltd-TW-105NCU05392149
record_format oai_dc
spelling ndltd-TW-105NCU053921492019-05-16T00:08:09Z http://ndltd.ncl.edu.tw/handle/9r5bcc A Study on Robust Audio Processing: From Signal Enhancement to Model Learning 強健性音訊處理研究:從訊號增強到模型學習 Yuan-Shan Lee 李遠山 博士 國立中央大學 資訊工程學系 105 Robustness against noise is a critical characteristic of an audio recognition (AR) system. To develop a robust AR system, this dissertation proposes two front-end processing methods. To suppress the effects of background noise on target sound, a speech enhancement method that is based on compressive sensing (CS) is proposed. A quasi-SNR criterion are first utilized to determine whether a frequency bin in the spectrogram is reliable, and a corresponding mask is designed. The mask-extracted components of spectra are regarded as partial observation. The CS theory is used to reconstruct components that are missing from partial observations. The noise component can be further removed by multiplying the imputed spectrum with the optimized gain. To separate the target sound from the interference, a source separation method that is based on a complex-valued deep recurrent neural network (C-DRNN) is developed. A key aspect of the C-DRNN is that the activations and weights are complex-valued. Phase estimation is integrated into the C-DRNN by the construction of a deep and complex-valued regression model in the time-frequency domain. This dissertation also develops two novel methods for back-end recognition. The first is a joint kernel dictionary learning (JKDL) method for sound event classification. Our JKDL learns the collaborative representation instead of the sparse representation. The learned representation is thus ``denser'' than the sparse representation that is learned by K-SVD. Moreover, the discriminative ability is improved by adding a classification error term into the objective function. The second is a hierarchical Dirichlet process mixture model (HPDMM), whose components can be shared between models of each audio category. Therefore, the proposed emotion models provide a better capture of the relationship between real-world emotional states. Jia-Ching Wang 王家慶 2017 學位論文 ; thesis 144 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立中央大學 === 資訊工程學系 === 105 === Robustness against noise is a critical characteristic of an audio recognition (AR) system. To develop a robust AR system, this dissertation proposes two front-end processing methods. To suppress the effects of background noise on target sound, a speech enhancement method that is based on compressive sensing (CS) is proposed. A quasi-SNR criterion are first utilized to determine whether a frequency bin in the spectrogram is reliable, and a corresponding mask is designed. The mask-extracted components of spectra are regarded as partial observation. The CS theory is used to reconstruct components that are missing from partial observations. The noise component can be further removed by multiplying the imputed spectrum with the optimized gain. To separate the target sound from the interference, a source separation method that is based on a complex-valued deep recurrent neural network (C-DRNN) is developed. A key aspect of the C-DRNN is that the activations and weights are complex-valued. Phase estimation is integrated into the C-DRNN by the construction of a deep and complex-valued regression model in the time-frequency domain. This dissertation also develops two novel methods for back-end recognition. The first is a joint kernel dictionary learning (JKDL) method for sound event classification. Our JKDL learns the collaborative representation instead of the sparse representation. The learned representation is thus ``denser'' than the sparse representation that is learned by K-SVD. Moreover, the discriminative ability is improved by adding a classification error term into the objective function. The second is a hierarchical Dirichlet process mixture model (HPDMM), whose components can be shared between models of each audio category. Therefore, the proposed emotion models provide a better capture of the relationship between real-world emotional states.
author2 Jia-Ching Wang
author_facet Jia-Ching Wang
Yuan-Shan Lee
李遠山
author Yuan-Shan Lee
李遠山
spellingShingle Yuan-Shan Lee
李遠山
A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
author_sort Yuan-Shan Lee
title A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
title_short A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
title_full A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
title_fullStr A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
title_full_unstemmed A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
title_sort study on robust audio processing: from signal enhancement to model learning
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/9r5bcc
work_keys_str_mv AT yuanshanlee astudyonrobustaudioprocessingfromsignalenhancementtomodellearning
AT lǐyuǎnshān astudyonrobustaudioprocessingfromsignalenhancementtomodellearning
AT yuanshanlee qiángjiànxìngyīnxùnchùlǐyánjiūcóngxùnhàozēngqiángdàomóxíngxuéxí
AT lǐyuǎnshān qiángjiànxìngyīnxùnchùlǐyánjiūcóngxùnhàozēngqiángdàomóxíngxuéxí
AT yuanshanlee studyonrobustaudioprocessingfromsignalenhancementtomodellearning
AT lǐyuǎnshān studyonrobustaudioprocessingfromsignalenhancementtomodellearning
_version_ 1719160904107425792