Robust Features and Efficient Models for Speaker Identification

博士 === 國立清華大學 === 電機工程學系 === 88 === The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive a...

Full description

Bibliographic Details
Main Authors: Kuo-Hwei Yuo, 游國輝
Other Authors: Hsiao-Chuan Wang
Format: Others
Language:zh-TW
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/58137971312172176468
id ndltd-TW-088NTHU0442099
record_format oai_dc
spelling ndltd-TW-088NTHU04420992016-07-08T04:23:17Z http://ndltd.ncl.edu.tw/handle/58137971312172176468 Robust Features and Efficient Models for Speaker Identification 適用於語者辨認之強鍵特徵參數和高效率模型 Kuo-Hwei Yuo 游國輝 博士 國立清華大學 電機工程學系 88 The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive and convolutional noises. In addition, we present two statistical models that depict a speaker’s feature space more efficiently than the classical method using Gaussian mixture model with diagonal covariance matrices. The first robust feature is based on filtering the temporal trajectories of short-time one-sided autocorrelation sequences of speech to remove the additive noise. The filtered sequences are denoted the relative autocorrelation sequences (RAS), and the mel-scale frequency cepstral coefficients (MFCC) are extracted from RAS instead of the original speech. This new speech feature set is denoted RAS-MFCC. The second robust feature is based on involving two steps of temporal trajectory filtering. The first filtering is applied in autocorrelation domain to remove the additive noise, and the second filtering is applied in logarithmic spectrum domain to remove the convolutional noise. The filtered sequence is called CHAnnel-Normalization Relative Autocorrelation Sequence (CHANRAS). The MFCCs are extracted from CHARAS and called CHARAS-MFCC. The RAS-MFCC is a special case of CHARAS-MFCC. We conduct experiments under a variety of noisy environments including additive and convolutional noises. The RAS-MFCC and CHARAS-MFCC are shown to be superior to projection method. The RAS-MFCC and the CHARAS-MFCC combining with projection measure can further improve identification accuracy. Next, we present a new GMM structure that can depict the speaker’s feature space more efficiently than the traditional GMM structure. This is based on that we embed a common uncorrelated transformation matrix to all Gaussian pdfs. The idea is similar to a classical approach derived from the Karhunen-Loéve transformation. However both algorithms to derive the trasformation matrix are inherently different. The proposed new GMM is called transformation embedded GMM (TE-GMM). The transformation matrix of TE-GMM as well as the other model parameters could be trained simltaneously using maximum likelihood estimation. Then we generalizes the one transformation used in TE-EMM to multiple transformations. We derive a new GMM, called General Covariance GMM (GC-GMM). The GMM with diagonal covariance matrices is denoted as DC-GMM (Diagonal Covariance GMM). The GMM with full covariance matrices is denoted as FC-GMM (Full Covariance GMM). Both DC-GMM and FC-GMM are special cases of GC-GMM. The experimental results show that the TE-GMM can achieve a better accuracy than the classical Karhunen-Loéve transformation method. The experimental results also show that, in comparison with the traditional GMM, the GC-GMM can reduce the computational complexity and the number of parameters significantly without degradation in system performance. Hsiao-Chuan Wang 王小川 2000 學位論文 ; thesis 118 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 博士 === 國立清華大學 === 電機工程學系 === 88 === The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive and convolutional noises. In addition, we present two statistical models that depict a speaker’s feature space more efficiently than the classical method using Gaussian mixture model with diagonal covariance matrices. The first robust feature is based on filtering the temporal trajectories of short-time one-sided autocorrelation sequences of speech to remove the additive noise. The filtered sequences are denoted the relative autocorrelation sequences (RAS), and the mel-scale frequency cepstral coefficients (MFCC) are extracted from RAS instead of the original speech. This new speech feature set is denoted RAS-MFCC. The second robust feature is based on involving two steps of temporal trajectory filtering. The first filtering is applied in autocorrelation domain to remove the additive noise, and the second filtering is applied in logarithmic spectrum domain to remove the convolutional noise. The filtered sequence is called CHAnnel-Normalization Relative Autocorrelation Sequence (CHANRAS). The MFCCs are extracted from CHARAS and called CHARAS-MFCC. The RAS-MFCC is a special case of CHARAS-MFCC. We conduct experiments under a variety of noisy environments including additive and convolutional noises. The RAS-MFCC and CHARAS-MFCC are shown to be superior to projection method. The RAS-MFCC and the CHARAS-MFCC combining with projection measure can further improve identification accuracy. Next, we present a new GMM structure that can depict the speaker’s feature space more efficiently than the traditional GMM structure. This is based on that we embed a common uncorrelated transformation matrix to all Gaussian pdfs. The idea is similar to a classical approach derived from the Karhunen-Loéve transformation. However both algorithms to derive the trasformation matrix are inherently different. The proposed new GMM is called transformation embedded GMM (TE-GMM). The transformation matrix of TE-GMM as well as the other model parameters could be trained simltaneously using maximum likelihood estimation. Then we generalizes the one transformation used in TE-EMM to multiple transformations. We derive a new GMM, called General Covariance GMM (GC-GMM). The GMM with diagonal covariance matrices is denoted as DC-GMM (Diagonal Covariance GMM). The GMM with full covariance matrices is denoted as FC-GMM (Full Covariance GMM). Both DC-GMM and FC-GMM are special cases of GC-GMM. The experimental results show that the TE-GMM can achieve a better accuracy than the classical Karhunen-Loéve transformation method. The experimental results also show that, in comparison with the traditional GMM, the GC-GMM can reduce the computational complexity and the number of parameters significantly without degradation in system performance.
author2 Hsiao-Chuan Wang
author_facet Hsiao-Chuan Wang
Kuo-Hwei Yuo
游國輝
author Kuo-Hwei Yuo
游國輝
spellingShingle Kuo-Hwei Yuo
游國輝
Robust Features and Efficient Models for Speaker Identification
author_sort Kuo-Hwei Yuo
title Robust Features and Efficient Models for Speaker Identification
title_short Robust Features and Efficient Models for Speaker Identification
title_full Robust Features and Efficient Models for Speaker Identification
title_fullStr Robust Features and Efficient Models for Speaker Identification
title_full_unstemmed Robust Features and Efficient Models for Speaker Identification
title_sort robust features and efficient models for speaker identification
publishDate 2000
url http://ndltd.ncl.edu.tw/handle/58137971312172176468
work_keys_str_mv AT kuohweiyuo robustfeaturesandefficientmodelsforspeakeridentification
AT yóuguóhuī robustfeaturesandefficientmodelsforspeakeridentification
AT kuohweiyuo shìyòngyúyǔzhěbiànrènzhīqiángjiàntèzhēngcānshùhégāoxiàolǜmóxíng
AT yóuguóhuī shìyòngyúyǔzhěbiànrènzhīqiángjiàntèzhēngcānshùhégāoxiàolǜmóxíng
_version_ 1718341575518453760