Speech classification using SIFT features on spectrogram images

Abstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly...

Full description

Bibliographic Details
Main Authors:	Quang Trung Nguyen, The Duy Bui
Format:	Article
Language:	English
Published:	World Scientific Publishing 2016-06-01
Series:	Vietnam Journal of Computer Science
Subjects:	LNBNN SIFT Speech perception Speech classification
Online Access:	http://link.springer.com/article/10.1007/s40595-016-0071-3

id	doaj-ac1ac9eebdc44971b6a75efec7e4878f
record_format	Article
spelling	doaj-ac1ac9eebdc44971b6a75efec7e4878f2020-11-25T02:46:50ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962016-06-013424725710.1007/s40595-016-0071-3Speech classification using SIFT features on spectrogram imagesQuang Trung Nguyen0The Duy Bui1Human Machine Interaction Laboratory, University of Engineering and Technology, VNU HanoiHuman Machine Interaction Laboratory, University of Engineering and Technology, VNU HanoiAbstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly, all traditional methods must be trained before classifying speech signal and must be retrained when having more training data or new class. In this paper, we propose an approach for speech classification using Scale-invariant Feature Transform (SIFT) features on spectrogram images of speech signal combination with Local naïve Bayes nearest neighbor. The proposed approach allows using feature vectors to have different sizes. With this approach, the achieved classification results are satisfactory. They are 73, 96, 95, 97 %, and 97 % on the ISOLET, English Isolated Digits, Vietnamese Places, Vietnamese Digits, JVPD databases, respectively. Especially, in a subset of the TMW database, the accuracy is 100 %. In addition, in our proposed approach, non-retraining is needed for additional training data after the training phase. The experiment shows that the more features are added to the model, the more is the accuracy in performance.http://link.springer.com/article/10.1007/s40595-016-0071-3LNBNNSIFTSpeech perceptionSpeech classification
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Quang Trung Nguyen The Duy Bui
spellingShingle	Quang Trung Nguyen The Duy Bui Speech classification using SIFT features on spectrogram images Vietnam Journal of Computer Science LNBNN SIFT Speech perception Speech classification
author_facet	Quang Trung Nguyen The Duy Bui
author_sort	Quang Trung Nguyen
title	Speech classification using SIFT features on spectrogram images
title_short	Speech classification using SIFT features on spectrogram images
title_full	Speech classification using SIFT features on spectrogram images
title_fullStr	Speech classification using SIFT features on spectrogram images
title_full_unstemmed	Speech classification using SIFT features on spectrogram images
title_sort	speech classification using sift features on spectrogram images
publisher	World Scientific Publishing
series	Vietnam Journal of Computer Science
issn	2196-8888 2196-8896
publishDate	2016-06-01
description	Abstract Classification of speech is one of the most vital problems in speech processing. Although there have been many studies on the classification of speech, the results are still limited. Firstly, most of the speech classification approaches requiring input data have the same dimension. Secondly, all traditional methods must be trained before classifying speech signal and must be retrained when having more training data or new class. In this paper, we propose an approach for speech classification using Scale-invariant Feature Transform (SIFT) features on spectrogram images of speech signal combination with Local naïve Bayes nearest neighbor. The proposed approach allows using feature vectors to have different sizes. With this approach, the achieved classification results are satisfactory. They are 73, 96, 95, 97 %, and 97 % on the ISOLET, English Isolated Digits, Vietnamese Places, Vietnamese Digits, JVPD databases, respectively. Especially, in a subset of the TMW database, the accuracy is 100 %. In addition, in our proposed approach, non-retraining is needed for additional training data after the training phase. The experiment shows that the more features are added to the model, the more is the accuracy in performance.
topic	LNBNN SIFT Speech perception Speech classification
url	http://link.springer.com/article/10.1007/s40595-016-0071-3
work_keys_str_mv	AT quangtrungnguyen speechclassificationusingsiftfeaturesonspectrogramimages AT theduybui speechclassificationusingsiftfeaturesonspectrogramimages
_version_	1724756478927044608

Speech classification using SIFT features on spectrogram images

Similar Items