Speech classification using SIFT features on spectrogram images
Abstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
World Scientific Publishing
2016-06-01
|
Series: | Vietnam Journal of Computer Science |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1007/s40595-016-0071-3 |
id |
doaj-ac1ac9eebdc44971b6a75efec7e4878f |
---|---|
record_format |
Article |
spelling |
doaj-ac1ac9eebdc44971b6a75efec7e4878f2020-11-25T02:46:50ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962016-06-013424725710.1007/s40595-016-0071-3Speech classification using SIFT features on spectrogram imagesQuang Trung Nguyen0The Duy Bui1Human Machine Interaction Laboratory, University of Engineering and Technology, VNU HanoiHuman Machine Interaction Laboratory, University of Engineering and Technology, VNU HanoiAbstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly, all traditional methods must be trained before classifying speech signal and must be retrained when having more training data or new class. In this paper, we propose an approach for speech classification using Scale-invariant Feature Transform (SIFT) features on spectrogram images of speech signal combination with Local naïve Bayes nearest neighbor. The proposed approach allows using feature vectors to have different sizes. With this approach, the achieved classification results are satisfactory. They are 73, 96, 95, 97 %, and 97 % on the ISOLET, English Isolated Digits, Vietnamese Places, Vietnamese Digits, JVPD databases, respectively. Especially, in a subset of the TMW database, the accuracy is 100 %. In addition, in our proposed approach, non-retraining is needed for additional training data after the training phase. The experiment shows that the more features are added to the model, the more is the accuracy in performance.http://link.springer.com/article/10.1007/s40595-016-0071-3LNBNNSIFTSpeech perceptionSpeech classification |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Quang Trung Nguyen The Duy Bui |
spellingShingle |
Quang Trung Nguyen The Duy Bui Speech classification using SIFT features on spectrogram images Vietnam Journal of Computer Science LNBNN SIFT Speech perception Speech classification |
author_facet |
Quang Trung Nguyen The Duy Bui |
author_sort |
Quang Trung Nguyen |
title |
Speech classification using SIFT features on spectrogram images |
title_short |
Speech classification using SIFT features on spectrogram images |
title_full |
Speech classification using SIFT features on spectrogram images |
title_fullStr |
Speech classification using SIFT features on spectrogram images |
title_full_unstemmed |
Speech classification using SIFT features on spectrogram images |
title_sort |
speech classification using sift features on spectrogram images |
publisher |
World Scientific Publishing |
series |
Vietnam Journal of Computer Science |
issn |
2196-8888 2196-8896 |
publishDate |
2016-06-01 |
description |
Abstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly, all traditional methods must be trained before classifying speech signal and must be retrained when having more training data or new class. In this paper, we propose an approach for speech classification using Scale-invariant Feature Transform (SIFT) features on spectrogram images of speech signal combination with Local naïve Bayes nearest neighbor. The proposed approach allows using feature vectors to have different sizes. With this approach, the achieved classification results are satisfactory. They are 73, 96, 95, 97 %, and 97 % on the ISOLET, English Isolated Digits, Vietnamese Places, Vietnamese Digits, JVPD databases, respectively. Especially, in a subset of the TMW database, the accuracy is 100 %. In addition, in our proposed approach, non-retraining is needed for additional training data after the training phase. The experiment shows that the more features are added to the model, the more is the accuracy in performance. |
topic |
LNBNN SIFT Speech perception Speech classification |
url |
http://link.springer.com/article/10.1007/s40595-016-0071-3 |
work_keys_str_mv |
AT quangtrungnguyen speechclassificationusingsiftfeaturesonspectrogramimages AT theduybui speechclassificationusingsiftfeaturesonspectrogramimages |
_version_ |
1724756478927044608 |