Speech classification using SIFT features on spectrogram images

Abstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly...

Full description

Bibliographic Details
Main Authors: Quang Trung Nguyen, The Duy Bui
Format: Article
Language:English
Published: World Scientific Publishing 2016-06-01
Series:Vietnam Journal of Computer Science
Subjects:
Online Access:http://link.springer.com/article/10.1007/s40595-016-0071-3
id doaj-ac1ac9eebdc44971b6a75efec7e4878f
record_format Article
spelling doaj-ac1ac9eebdc44971b6a75efec7e4878f2020-11-25T02:46:50ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962016-06-013424725710.1007/s40595-016-0071-3Speech classification using SIFT features on spectrogram imagesQuang Trung Nguyen0The Duy Bui1Human Machine Interaction Laboratory, University of Engineering and Technology, VNU HanoiHuman Machine Interaction Laboratory, University of Engineering and Technology, VNU HanoiAbstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly, all traditional methods must be trained before classifying speech signal and must be retrained when having more training data or new class. In this paper, we propose an approach for speech classification using Scale-invariant Feature Transform (SIFT) features on spectrogram images of speech signal combination with Local naïve Bayes nearest neighbor. The proposed approach allows using feature vectors to have different sizes. With this approach, the achieved classification results are satisfactory. They are 73, 96, 95, 97 %, and 97 % on the ISOLET, English Isolated Digits, Vietnamese Places, Vietnamese Digits, JVPD databases, respectively. Especially, in a subset of the TMW database, the accuracy is 100 %. In addition, in our proposed approach, non-retraining is needed for additional training data after the training phase. The experiment shows that the more features are added to the model, the more is the accuracy in performance.http://link.springer.com/article/10.1007/s40595-016-0071-3LNBNNSIFTSpeech perceptionSpeech classification
collection DOAJ
language English
format Article
sources DOAJ
author Quang Trung Nguyen
The Duy Bui
spellingShingle Quang Trung Nguyen
The Duy Bui
Speech classification using SIFT features on spectrogram images
Vietnam Journal of Computer Science
LNBNN
SIFT
Speech perception
Speech classification
author_facet Quang Trung Nguyen
The Duy Bui
author_sort Quang Trung Nguyen
title Speech classification using SIFT features on spectrogram images
title_short Speech classification using SIFT features on spectrogram images
title_full Speech classification using SIFT features on spectrogram images
title_fullStr Speech classification using SIFT features on spectrogram images
title_full_unstemmed Speech classification using SIFT features on spectrogram images
title_sort speech classification using sift features on spectrogram images
publisher World Scientific Publishing
series Vietnam Journal of Computer Science
issn 2196-8888
2196-8896
publishDate 2016-06-01
description Abstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly, all traditional methods must be trained before classifying speech signal and must be retrained when having more training data or new class. In this paper, we propose an approach for speech classification using Scale-invariant Feature Transform (SIFT) features on spectrogram images of speech signal combination with Local naïve Bayes nearest neighbor. The proposed approach allows using feature vectors to have different sizes. With this approach, the achieved classification results are satisfactory. They are 73, 96, 95, 97 %, and 97 % on the ISOLET, English Isolated Digits, Vietnamese Places, Vietnamese Digits, JVPD databases, respectively. Especially, in a subset of the TMW database, the accuracy is 100 %. In addition, in our proposed approach, non-retraining is needed for additional training data after the training phase. The experiment shows that the more features are added to the model, the more is the accuracy in performance.
topic LNBNN
SIFT
Speech perception
Speech classification
url http://link.springer.com/article/10.1007/s40595-016-0071-3
work_keys_str_mv AT quangtrungnguyen speechclassificationusingsiftfeaturesonspectrogramimages
AT theduybui speechclassificationusingsiftfeaturesonspectrogramimages
_version_ 1724756478927044608