Bio-inspired noise robust auditory features
The purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on sig...
Main Author: | |
---|---|
Published: |
Georgia Institute of Technology
2012
|
Subjects: | |
Online Access: | http://hdl.handle.net/1853/44801 |
id |
ndltd-GATECH-oai-smartech.gatech.edu-1853-44801 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-GATECH-oai-smartech.gatech.edu-1853-448012013-01-10T17:18:15ZBio-inspired noise robust auditory featuresJavadi, AilarSpeech recognitionMFCCsNoise-robust featuresFeature extractionBiologically-inspired computingAutomatic speech recognitionComputational auditory scene analysisThe purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on signal-to-noise ratio levels of input signals. This work has been motivated by noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a spatial derivative step is used to sharpen the results, followed by an envelope detector (recti- fication and smoothing) and down-sampling for each filter bank before being compressed. DCT is then applied to the results of all filter banks to produce features. The Hidden- Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech recognition given the features we have extracted. In this work, we investigate the role of filter types, window size, spatial derivative, rectification types, smoothing, down- sampling and compression and compared the final results to state-of-the-art Mel-frequency cepstral coefficients (MFCC). A series of conclusions and insights are provided for each step of the process. The goal of this work has not been to outperform MFCCs; however, we have shown that by changing the compression type from log compression to 0.07 root compression we are able to outperform MFCCs for all noisy conditions.Georgia Institute of Technology2012-09-20T18:20:28Z2012-09-20T18:20:28Z2012-06-12Thesishttp://hdl.handle.net/1853/44801 |
collection |
NDLTD |
sources |
NDLTD |
topic |
Speech recognition MFCCs Noise-robust features Feature extraction Biologically-inspired computing Automatic speech recognition Computational auditory scene analysis |
spellingShingle |
Speech recognition MFCCs Noise-robust features Feature extraction Biologically-inspired computing Automatic speech recognition Computational auditory scene analysis Javadi, Ailar Bio-inspired noise robust auditory features |
description |
The purpose of this work
is to investigate a series of biologically inspired modifications to state-of-the-art Mel-
frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition
results. We have provided recommendations to improve speech recognition results de-
pending on signal-to-noise ratio levels of input signals. This work has been motivated by
noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a
spatial derivative step is used to sharpen the results, followed by an envelope detector (recti-
fication and smoothing) and down-sampling for each filter bank before being compressed.
DCT is then applied to the results of all filter banks to produce features. The Hidden-
Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech
recognition given the features we have extracted. In this work, we investigate the
role of filter types, window size, spatial derivative, rectification types, smoothing, down-
sampling and compression and compared the final results to state-of-the-art Mel-frequency
cepstral coefficients (MFCC). A series of conclusions and insights are provided for each
step of the process. The goal of this work has not been to outperform MFCCs; however,
we have shown that by changing the compression type from log compression to 0.07 root
compression we are able to outperform MFCCs for all noisy conditions. |
author |
Javadi, Ailar |
author_facet |
Javadi, Ailar |
author_sort |
Javadi, Ailar |
title |
Bio-inspired noise robust auditory features |
title_short |
Bio-inspired noise robust auditory features |
title_full |
Bio-inspired noise robust auditory features |
title_fullStr |
Bio-inspired noise robust auditory features |
title_full_unstemmed |
Bio-inspired noise robust auditory features |
title_sort |
bio-inspired noise robust auditory features |
publisher |
Georgia Institute of Technology |
publishDate |
2012 |
url |
http://hdl.handle.net/1853/44801 |
work_keys_str_mv |
AT javadiailar bioinspirednoiserobustauditoryfeatures |
_version_ |
1716574275767369728 |