Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition
博士 === 國立暨南國際大學 === 電機工程學系 === 100 === The performance of an automatic speech recognition (ASR) system is often degraded due to the various types of noise and interference in the application environment. In this disseration, we aim to develop robustness methods specifically for handling additive...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/66756772463135462510 |
id |
ndltd-TW-100NCNU0442106 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NCNU04421062015-10-13T21:07:20Z http://ndltd.ncl.edu.tw/handle/66756772463135462510 Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition 強化各域之語音特徵於雜訊強健性語音辨識之研究 Wen-Hsiang Tu 杜文祥 博士 國立暨南國際大學 電機工程學系 100 The performance of an automatic speech recognition (ASR) system is often degraded due to the various types of noise and interference in the application environment. In this disseration, we aim to develop robustness methods specifically for handling additive noise and channel disturbance. In particular, these developed methods are used to refine the mel-frequency cepstral coefficient (MFCC), which is one of the most widely used speech feature representation in ASR. At first, we discuss the effect of noise in the linear spectral domain of MFCC, and then present the approach of magnitude spectrum enhancement (MSE) to refine the spectrum of speech signals. Next, the method of hybrid cepstral statistics normalization is presented to process the MFCC in the mel-spectral domain. Finally, two novel compensation algorithms, modulation spectrum replacement (MSR) and modulation spectrum filtering (MSF), are provided to enhance the MFCC in the cepstral domain. The recognition experiments conducted on the Aurora-2 connected-digit database show that the aforementioned novel methods are capable of improving the recognition accuracy of the MFCC in various noise conditions, and in most cases they perform better than, or at least similarly to, the state-of-the-art noise robustness techniques such as Wiener filtering (WF), spectral subtraction (SS), mean and variance normalization (MVN) and histogram equalization (HEQ). Jeih-weih Hung 洪志偉 2012 學位論文 ; thesis 81 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立暨南國際大學 === 電機工程學系 === 100 === The performance of an automatic speech recognition (ASR) system is often degraded due to the various types of noise and interference in the application environment. In this disseration, we aim to develop robustness methods specifically for handling additive noise and channel disturbance. In particular, these developed methods are used to refine the mel-frequency cepstral coefficient (MFCC), which is one of the most widely used speech feature representation in ASR.
At first, we discuss the effect of noise in the linear spectral domain of MFCC, and then present the approach of magnitude spectrum enhancement (MSE) to refine the spectrum of speech signals. Next, the method of hybrid cepstral statistics normalization is presented to process the MFCC in the mel-spectral domain. Finally, two novel compensation algorithms, modulation spectrum replacement (MSR) and modulation spectrum filtering (MSF), are provided to enhance the MFCC in the cepstral domain. The recognition experiments conducted on the Aurora-2 connected-digit database show that the aforementioned novel methods are capable of improving the recognition accuracy of the MFCC in various noise conditions, and in most cases they perform better than, or at least similarly to, the state-of-the-art noise robustness techniques such as Wiener filtering (WF), spectral subtraction (SS), mean and variance normalization (MVN) and histogram equalization (HEQ).
|
author2 |
Jeih-weih Hung |
author_facet |
Jeih-weih Hung Wen-Hsiang Tu 杜文祥 |
author |
Wen-Hsiang Tu 杜文祥 |
spellingShingle |
Wen-Hsiang Tu 杜文祥 Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition |
author_sort |
Wen-Hsiang Tu |
title |
Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition |
title_short |
Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition |
title_full |
Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition |
title_fullStr |
Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition |
title_full_unstemmed |
Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition |
title_sort |
enhancing speech features in various domains for noise-robust speech recognition |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/66756772463135462510 |
work_keys_str_mv |
AT wenhsiangtu enhancingspeechfeaturesinvariousdomainsfornoiserobustspeechrecognition AT dùwénxiáng enhancingspeechfeaturesinvariousdomainsfornoiserobustspeechrecognition AT wenhsiangtu qiánghuàgèyùzhīyǔyīntèzhēngyúzáxùnqiángjiànxìngyǔyīnbiànshízhīyánjiū AT dùwénxiáng qiánghuàgèyùzhīyǔyīntèzhēngyúzáxùnqiángjiànxìngyǔyīnbiànshízhīyánjiū |
_version_ |
1718056146102648832 |