Performance evaluation of Hindi speech recognition system using optimized filterbanks

An Automatic Speech Recognition (ASR) system implementation uses a conventional pattern recognition technique that stores a set of training patterns in classes and compares the test patterns with training patterns to place them in the best matched pattern class. Most state-of-the-art ASR systems use...

Full description

Bibliographic Details
Main Authors: Mohit Dua, Rajesh Kumar Aggarwal, Mantosh Biswas
Format: Article
Language:English
Published: Elsevier 2018-06-01
Series:Engineering Science and Technology, an International Journal
Online Access:http://www.sciencedirect.com/science/article/pii/S2215098617318281
id doaj-f7ebd574df9c418eaf95f6c50b0ce616
record_format Article
spelling doaj-f7ebd574df9c418eaf95f6c50b0ce6162020-11-25T02:24:45ZengElsevierEngineering Science and Technology, an International Journal2215-09862018-06-01213389398Performance evaluation of Hindi speech recognition system using optimized filterbanksMohit Dua0Rajesh Kumar Aggarwal1Mantosh Biswas2Corresponding author.; Department of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaDepartment of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaDepartment of Computer Engineering, National Institute of Technology, Kurukshetra, IndiaAn Automatic Speech Recognition (ASR) system implementation uses a conventional pattern recognition technique that stores a set of training patterns in classes and compares the test patterns with training patterns to place them in the best matched pattern class. Most state-of-the-art ASR systems use Mel Frequency Cepstral Coefficient (MFCC) and Perceptual Linear Prediction (PLP) to extract features in training phase of the ASR system. However, sensitivity of MFCC & PLP to background noise has resulted in use of noise robust features Gammatone Frequency Cepstral Coefficient (GFCC) and Basilar-membrane Frequency-band Cepstral Coefficient (BFCC). But many issues associated with these feature extraction methods, like accepted bandwidth and standard number of filters are unresolved till date. This paper proposes a novel approach to use Differential Evolution (DE) algorithm to optimize the number and spacing of filters used in MFCC, GFCC and BFCC techniques. It also evaluates the performance of the said feature extraction methods with and without DE optimization in clean as well as in noisy environments. The results conclude that BFCC based ASR systems performs 0.4% to 1.0% better than GFCC and 7% to 10% better than MFCC in different conditions. Keywords: Automatic speech recognition, MFCC, GFCC, BFCC, Differential evolutionhttp://www.sciencedirect.com/science/article/pii/S2215098617318281
collection DOAJ
language English
format Article
sources DOAJ
author Mohit Dua
Rajesh Kumar Aggarwal
Mantosh Biswas
spellingShingle Mohit Dua
Rajesh Kumar Aggarwal
Mantosh Biswas
Performance evaluation of Hindi speech recognition system using optimized filterbanks
Engineering Science and Technology, an International Journal
author_facet Mohit Dua
Rajesh Kumar Aggarwal
Mantosh Biswas
author_sort Mohit Dua
title Performance evaluation of Hindi speech recognition system using optimized filterbanks
title_short Performance evaluation of Hindi speech recognition system using optimized filterbanks
title_full Performance evaluation of Hindi speech recognition system using optimized filterbanks
title_fullStr Performance evaluation of Hindi speech recognition system using optimized filterbanks
title_full_unstemmed Performance evaluation of Hindi speech recognition system using optimized filterbanks
title_sort performance evaluation of hindi speech recognition system using optimized filterbanks
publisher Elsevier
series Engineering Science and Technology, an International Journal
issn 2215-0986
publishDate 2018-06-01
description An Automatic Speech Recognition (ASR) system implementation uses a conventional pattern recognition technique that stores a set of training patterns in classes and compares the test patterns with training patterns to place them in the best matched pattern class. Most state-of-the-art ASR systems use Mel Frequency Cepstral Coefficient (MFCC) and Perceptual Linear Prediction (PLP) to extract features in training phase of the ASR system. However, sensitivity of MFCC & PLP to background noise has resulted in use of noise robust features Gammatone Frequency Cepstral Coefficient (GFCC) and Basilar-membrane Frequency-band Cepstral Coefficient (BFCC). But many issues associated with these feature extraction methods, like accepted bandwidth and standard number of filters are unresolved till date. This paper proposes a novel approach to use Differential Evolution (DE) algorithm to optimize the number and spacing of filters used in MFCC, GFCC and BFCC techniques. It also evaluates the performance of the said feature extraction methods with and without DE optimization in clean as well as in noisy environments. The results conclude that BFCC based ASR systems performs 0.4% to 1.0% better than GFCC and 7% to 10% better than MFCC in different conditions. Keywords: Automatic speech recognition, MFCC, GFCC, BFCC, Differential evolution
url http://www.sciencedirect.com/science/article/pii/S2215098617318281
work_keys_str_mv AT mohitdua performanceevaluationofhindispeechrecognitionsystemusingoptimizedfilterbanks
AT rajeshkumaraggarwal performanceevaluationofhindispeechrecognitionsystemusingoptimizedfilterbanks
AT mantoshbiswas performanceevaluationofhindispeechrecognitionsystemusingoptimizedfilterbanks
_version_ 1724853615718301696