A Study on Robust Audio Processing: From Signal Enhancement to Model Learning
博士 === 國立中央大學 === 資訊工程學系 === 105 === Robustness against noise is a critical characteristic of an audio recognition (AR) system. To develop a robust AR system, this dissertation proposes two front-end processing methods. To suppress the effects of background noise on target sound, a speech enhancemen...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/9r5bcc |
id |
ndltd-TW-105NCU05392149 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCU053921492019-05-16T00:08:09Z http://ndltd.ncl.edu.tw/handle/9r5bcc A Study on Robust Audio Processing: From Signal Enhancement to Model Learning 強健性音訊處理研究:從訊號增強到模型學習 Yuan-Shan Lee 李遠山 博士 國立中央大學 資訊工程學系 105 Robustness against noise is a critical characteristic of an audio recognition (AR) system. To develop a robust AR system, this dissertation proposes two front-end processing methods. To suppress the effects of background noise on target sound, a speech enhancement method that is based on compressive sensing (CS) is proposed. A quasi-SNR criterion are first utilized to determine whether a frequency bin in the spectrogram is reliable, and a corresponding mask is designed. The mask-extracted components of spectra are regarded as partial observation. The CS theory is used to reconstruct components that are missing from partial observations. The noise component can be further removed by multiplying the imputed spectrum with the optimized gain. To separate the target sound from the interference, a source separation method that is based on a complex-valued deep recurrent neural network (C-DRNN) is developed. A key aspect of the C-DRNN is that the activations and weights are complex-valued. Phase estimation is integrated into the C-DRNN by the construction of a deep and complex-valued regression model in the time-frequency domain. This dissertation also develops two novel methods for back-end recognition. The first is a joint kernel dictionary learning (JKDL) method for sound event classification. Our JKDL learns the collaborative representation instead of the sparse representation. The learned representation is thus ``denser'' than the sparse representation that is learned by K-SVD. Moreover, the discriminative ability is improved by adding a classification error term into the objective function. The second is a hierarchical Dirichlet process mixture model (HPDMM), whose components can be shared between models of each audio category. Therefore, the proposed emotion models provide a better capture of the relationship between real-world emotional states. Jia-Ching Wang 王家慶 2017 學位論文 ; thesis 144 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立中央大學 === 資訊工程學系 === 105 === Robustness against noise is a critical characteristic of an audio recognition (AR) system. To develop a robust AR system, this dissertation proposes two front-end processing methods. To suppress the effects of background noise on target sound, a speech enhancement method that is based on compressive sensing (CS) is proposed. A quasi-SNR criterion are first utilized to determine whether a frequency bin in the spectrogram is reliable, and a corresponding mask is designed. The mask-extracted components of spectra are regarded as partial observation. The CS theory is used to reconstruct components that are missing from partial observations. The noise component can be further removed by multiplying the imputed spectrum with the optimized gain. To separate the target sound from the interference, a source separation method that is based on a complex-valued deep recurrent neural network (C-DRNN) is developed. A key aspect of the C-DRNN is that the activations and weights are complex-valued. Phase estimation is integrated into the C-DRNN by the construction of a deep and complex-valued regression model in the time-frequency domain. This dissertation also develops two novel methods for back-end recognition. The first is a joint kernel dictionary learning (JKDL) method for sound event classification. Our JKDL learns the collaborative representation instead of the sparse representation. The learned representation is thus ``denser'' than the sparse representation that is learned by K-SVD. Moreover, the discriminative ability is improved by adding a classification error term into the objective function. The second is a hierarchical Dirichlet process mixture model (HPDMM), whose components can be shared between models of each audio category. Therefore, the proposed emotion models provide a better capture of the relationship between real-world emotional states.
|
author2 |
Jia-Ching Wang |
author_facet |
Jia-Ching Wang Yuan-Shan Lee 李遠山 |
author |
Yuan-Shan Lee 李遠山 |
spellingShingle |
Yuan-Shan Lee 李遠山 A Study on Robust Audio Processing: From Signal Enhancement to Model Learning |
author_sort |
Yuan-Shan Lee |
title |
A Study on Robust Audio Processing: From Signal Enhancement to Model Learning |
title_short |
A Study on Robust Audio Processing: From Signal Enhancement to Model Learning |
title_full |
A Study on Robust Audio Processing: From Signal Enhancement to Model Learning |
title_fullStr |
A Study on Robust Audio Processing: From Signal Enhancement to Model Learning |
title_full_unstemmed |
A Study on Robust Audio Processing: From Signal Enhancement to Model Learning |
title_sort |
study on robust audio processing: from signal enhancement to model learning |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/9r5bcc |
work_keys_str_mv |
AT yuanshanlee astudyonrobustaudioprocessingfromsignalenhancementtomodellearning AT lǐyuǎnshān astudyonrobustaudioprocessingfromsignalenhancementtomodellearning AT yuanshanlee qiángjiànxìngyīnxùnchùlǐyánjiūcóngxùnhàozēngqiángdàomóxíngxuéxí AT lǐyuǎnshān qiángjiànxìngyīnxùnchùlǐyánjiūcóngxùnhàozēngqiángdàomóxíngxuéxí AT yuanshanlee studyonrobustaudioprocessingfromsignalenhancementtomodellearning AT lǐyuǎnshān studyonrobustaudioprocessingfromsignalenhancementtomodellearning |
_version_ |
1719160904107425792 |