Robust Features and Efficient Models for Speaker Identification

博士 === 國立清華大學 === 電機工程學系 === 88 === The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive a...

Full description

Bibliographic Details
Main Authors:	Kuo-Hwei Yuo, 游國輝
Other Authors:	Hsiao-Chuan Wang
Format:	Others
Language:	zh-TW
Published:	2000
Online Access:	http://ndltd.ncl.edu.tw/handle/58137971312172176468

id	ndltd-TW-088NTHU0442099
record_format	oai_dc
spelling	ndltd-TW-088NTHU04420992016-07-08T04:23:17Z http://ndltd.ncl.edu.tw/handle/58137971312172176468 Robust Features and Efficient Models for Speaker Identification 適用於語者辨認之強鍵特徵參數和高效率模型 Kuo-Hwei Yuo 游國輝博士國立清華大學電機工程學系 88 The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive and convolutional noises. In addition, we present two statistical models that depict a speaker’s feature space more efficiently than the classical method using Gaussian mixture model with diagonal covariance matrices. The first robust feature is based on filtering the temporal trajectories of short-time one-sided autocorrelation sequences of speech to remove the additive noise. The filtered sequences are denoted the relative autocorrelation sequences (RAS), and the mel-scale frequency cepstral coefficients (MFCC) are extracted from RAS instead of the original speech. This new speech feature set is denoted RAS-MFCC. The second robust feature is based on involving two steps of temporal trajectory filtering. The first filtering is applied in autocorrelation domain to remove the additive noise, and the second filtering is applied in logarithmic spectrum domain to remove the convolutional noise. The filtered sequence is called CHAnnel-Normalization Relative Autocorrelation Sequence (CHANRAS). The MFCCs are extracted from CHARAS and called CHARAS-MFCC. The RAS-MFCC is a special case of CHARAS-MFCC. We conduct experiments under a variety of noisy environments including additive and convolutional noises. The RAS-MFCC and CHARAS-MFCC are shown to be superior to projection method. The RAS-MFCC and the CHARAS-MFCC combining with projection measure can further improve identification accuracy. Next, we present a new GMM structure that can depict the speaker’s feature space more efficiently than the traditional GMM structure. This is based on that we embed a common uncorrelated transformation matrix to all Gaussian pdfs. The idea is similar to a classical approach derived from the Karhunen-Loéve transformation. However both algorithms to derive the trasformation matrix are inherently different. The proposed new GMM is called transformation embedded GMM (TE-GMM). The transformation matrix of TE-GMM as well as the other model parameters could be trained simltaneously using maximum likelihood estimation. Then we generalizes the one transformation used in TE-EMM to multiple transformations. We derive a new GMM, called General Covariance GMM (GC-GMM). The GMM with diagonal covariance matrices is denoted as DC-GMM (Diagonal Covariance GMM). The GMM with full covariance matrices is denoted as FC-GMM (Full Covariance GMM). Both DC-GMM and FC-GMM are special cases of GC-GMM. The experimental results show that the TE-GMM can achieve a better accuracy than the classical Karhunen-Loéve transformation method. The experimental results also show that, in comparison with the traditional GMM, the GC-GMM can reduce the computational complexity and the number of parameters significantly without degradation in system performance. Hsiao-Chuan Wang 王小川 2000 學位論文 ; thesis 118 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	博士 === 國立清華大學 === 電機工程學系 === 88 === The objective of this dissertation is to find robust features and efficient models to improve the speaker recognition performance. Two types of robust features are presented. One is robust to additive noise, and the other is robust to the coexistence of additive and convolutional noises. In addition, we present two statistical models that depict a speaker’s feature space more efficiently than the classical method using Gaussian mixture model with diagonal covariance matrices. The first robust feature is based on filtering the temporal trajectories of short-time one-sided autocorrelation sequences of speech to remove the additive noise. The filtered sequences are denoted the relative autocorrelation sequences (RAS), and the mel-scale frequency cepstral coefficients (MFCC) are extracted from RAS instead of the original speech. This new speech feature set is denoted RAS-MFCC. The second robust feature is based on involving two steps of temporal trajectory filtering. The first filtering is applied in autocorrelation domain to remove the additive noise, and the second filtering is applied in logarithmic spectrum domain to remove the convolutional noise. The filtered sequence is called CHAnnel-Normalization Relative Autocorrelation Sequence (CHANRAS). The MFCCs are extracted from CHARAS and called CHARAS-MFCC. The RAS-MFCC is a special case of CHARAS-MFCC. We conduct experiments under a variety of noisy environments including additive and convolutional noises. The RAS-MFCC and CHARAS-MFCC are shown to be superior to projection method. The RAS-MFCC and the CHARAS-MFCC combining with projection measure can further improve identification accuracy. Next, we present a new GMM structure that can depict the speaker’s feature space more efficiently than the traditional GMM structure. This is based on that we embed a common uncorrelated transformation matrix to all Gaussian pdfs. The idea is similar to a classical approach derived from the Karhunen-Loéve transformation. However both algorithms to derive the trasformation matrix are inherently different. The proposed new GMM is called transformation embedded GMM (TE-GMM). The transformation matrix of TE-GMM as well as the other model parameters could be trained simltaneously using maximum likelihood estimation. Then we generalizes the one transformation used in TE-EMM to multiple transformations. We derive a new GMM, called General Covariance GMM (GC-GMM). The GMM with diagonal covariance matrices is denoted as DC-GMM (Diagonal Covariance GMM). The GMM with full covariance matrices is denoted as FC-GMM (Full Covariance GMM). Both DC-GMM and FC-GMM are special cases of GC-GMM. The experimental results show that the TE-GMM can achieve a better accuracy than the classical Karhunen-Loéve transformation method. The experimental results also show that, in comparison with the traditional GMM, the GC-GMM can reduce the computational complexity and the number of parameters significantly without degradation in system performance.
author2	Hsiao-Chuan Wang
author_facet	Hsiao-Chuan Wang Kuo-Hwei Yuo 游國輝
author	Kuo-Hwei Yuo 游國輝
spellingShingle	Kuo-Hwei Yuo 游國輝 Robust Features and Efficient Models for Speaker Identification
author_sort	Kuo-Hwei Yuo
title	Robust Features and Efficient Models for Speaker Identification
title_short	Robust Features and Efficient Models for Speaker Identification
title_full	Robust Features and Efficient Models for Speaker Identification
title_fullStr	Robust Features and Efficient Models for Speaker Identification
title_full_unstemmed	Robust Features and Efficient Models for Speaker Identification
title_sort	robust features and efficient models for speaker identification
publishDate	2000
url	http://ndltd.ncl.edu.tw/handle/58137971312172176468
work_keys_str_mv	AT kuohweiyuo robustfeaturesandefficientmodelsforspeakeridentification AT yóuguóhuī robustfeaturesandefficientmodelsforspeakeridentification AT kuohweiyuo shìyòngyúyǔzhěbiànrènzhīqiángjiàntèzhēngcānshùhégāoxiàolǜmóxíng AT yóuguóhuī shìyòngyúyǔzhěbiànrènzhīqiángjiàntèzhēngcānshùhégāoxiàolǜmóxíng
_version_	1718341575518453760

Robust Features and Efficient Models for Speaker Identification

Similar Items