On the Use of Complementary Spectral Features for Speaker Recognition

The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs). These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this wo...

Full description

Bibliographic Details
Main Authors: Sridhar Krishnan, Danoush Hosseinzadeh
Format: Article
Language:English
Published: SpringerOpen 2007-12-01
Series:EURASIP Journal on Advances in Signal Processing
Online Access:http://dx.doi.org/10.1155/2008/258184
id doaj-de8d6d7892ca4f668bc51dcc7810c8ac
record_format Article
spelling doaj-de8d6d7892ca4f668bc51dcc7810c8ac2020-11-25T00:39:10ZengSpringerOpenEURASIP Journal on Advances in Signal Processing1687-61722007-12-01200810.1155/2008/258184On the Use of Complementary Spectral Features for Speaker RecognitionSridhar KrishnanDanoush HosseinzadehThe most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs). These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC), spectral bandwidth (SBW), spectral band energy (SBE), spectral crest factor (SCF), spectral flatness measure (SFM), Shannon entropy (SE), and Renyi entropy (RE) were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN), and a bandpass channel with 1 dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ΔMFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10–40 dB). For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.http://dx.doi.org/10.1155/2008/258184
collection DOAJ
language English
format Article
sources DOAJ
author Sridhar Krishnan
Danoush Hosseinzadeh
spellingShingle Sridhar Krishnan
Danoush Hosseinzadeh
On the Use of Complementary Spectral Features for Speaker Recognition
EURASIP Journal on Advances in Signal Processing
author_facet Sridhar Krishnan
Danoush Hosseinzadeh
author_sort Sridhar Krishnan
title On the Use of Complementary Spectral Features for Speaker Recognition
title_short On the Use of Complementary Spectral Features for Speaker Recognition
title_full On the Use of Complementary Spectral Features for Speaker Recognition
title_fullStr On the Use of Complementary Spectral Features for Speaker Recognition
title_full_unstemmed On the Use of Complementary Spectral Features for Speaker Recognition
title_sort on the use of complementary spectral features for speaker recognition
publisher SpringerOpen
series EURASIP Journal on Advances in Signal Processing
issn 1687-6172
publishDate 2007-12-01
description The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs). These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC), spectral bandwidth (SBW), spectral band energy (SBE), spectral crest factor (SCF), spectral flatness measure (SFM), Shannon entropy (SE), and Renyi entropy (RE) were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN), and a bandpass channel with 1 dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ΔMFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10–40 dB). For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.
url http://dx.doi.org/10.1155/2008/258184
work_keys_str_mv AT sridharkrishnan ontheuseofcomplementaryspectralfeaturesforspeakerrecognition
AT danoushhosseinzadeh ontheuseofcomplementaryspectralfeaturesforspeakerrecognition
_version_ 1725294855628783616