Aggregate A Posteriori Linear Regression for Speech Recognition
碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 92 === This study proposed an aggregate a posteriori probability-based discriminant linear regression adaptation algorithm. Discriminant training approach was better than maximum likelihood-based one on the model parameter estimation. Not only the similarity of obs...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2004
|
Online Access: | http://ndltd.ncl.edu.tw/handle/22610583848346334523 |
id |
ndltd-TW-092NCKU5392043 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-092NCKU53920432016-06-17T04:16:57Z http://ndltd.ncl.edu.tw/handle/22610583848346334523 Aggregate A Posteriori Linear Regression for Speech Recognition 聚集事後機率線性迴歸調適法應用於語音辨識 Yii-Kai Wang 王奕凱 碩士 國立成功大學 資訊工程學系碩博士班 92 This study proposed an aggregate a posteriori probability-based discriminant linear regression adaptation algorithm. Discriminant training approach was better than maximum likelihood-based one on the model parameter estimation. Not only the similarity of observation to the objective model was considered, but it also took the likelihood ratio between the objective model and other competing ones into consideration. Therefore, it was observed that the classification error rate could be reduced effectively by the use of the discriminant model parameters. Its drawback is longer training time cost because the gradient descent algorithm was the only one algorithm used for its training. In the linear regression-based model adaptation algorithms, the first proposed one is maximum likelihood linear regression (MLLR) adaptation. Because the regression class tied similar acoustic units to share the same regression matrix, its adaptation performance would be better than that of maximum a posteriori (MAP) adaptation when the number of the adaptation utterances was not enough to adapt all parameters. For the robustness of regression matrix adaptation, the maximum a posteriori linear regression (MAPLR) and the quasi-Bayes linear regression (QBLR) which was capable of online adaptation were proposed in which the a priori density of regression matrix was included. Recently, the discriminant training was combined with linear regression adaptation to be the minimum classification error linear regression (MCELR) adaptation because of the superiority of discriminant training. Herein, the prior information of regression matrix was adopted when the discriminant training criterion was used to adapt the matrix. Better regression matrix could be estimated and used in the speaker adaptation through the combination of robustness from Bayes criterion and the discrimination from minimum classification error criterion. According to the relation between aggregate a posteriori probability and discriminant training, the closed-form solution of parameter estimation was obtained and it could accelerate the discriminant parameter estimation under proper simplification. The theoretical difference between the MAPLR and the method we proposed here could be established. In the experiments, TCC300 database was used to train the SI acoustic models and the prior distribution of regression matrix. In the parameter adaptation and testing, the TV broadcast news database collected by public television service foundation (PTS) was used to evaluate the adaptation performance. We evaluated the robustness of regression matrix adaptation using different number of adaptation utterances and compared the performance using different regression matrix adaptation algorithms, included MLLR, MAPLR, QBLR and MCELR. Jen-Tzung Chien 簡仁宗 2004 學位論文 ; thesis 97 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 92 === This study proposed an aggregate a posteriori probability-based discriminant linear regression adaptation algorithm. Discriminant training approach was better than maximum likelihood-based one on the model parameter estimation. Not only the similarity of observation to the objective model was considered, but it also took the likelihood ratio between the objective model and other competing ones into consideration. Therefore, it was observed that the classification error rate could be reduced effectively by the use of the discriminant model parameters. Its drawback is longer training time cost because the gradient descent algorithm was the only one algorithm used for its training. In the linear regression-based model adaptation algorithms, the first proposed one is maximum likelihood linear regression (MLLR) adaptation. Because the regression class tied similar acoustic units to share the same regression matrix, its adaptation performance would be better than that of maximum a posteriori (MAP) adaptation when the number of the adaptation utterances was not enough to adapt all parameters. For the robustness of regression matrix adaptation, the maximum a posteriori linear regression (MAPLR) and the quasi-Bayes linear regression (QBLR) which was capable of online adaptation were proposed in which the a priori density of regression matrix was included. Recently, the discriminant training was combined with linear regression adaptation to be the minimum classification error linear regression (MCELR) adaptation because of the superiority of discriminant training. Herein, the prior information of regression matrix was adopted when the discriminant training criterion was used to adapt the matrix. Better regression matrix could be estimated and used in the speaker adaptation through the combination of robustness from Bayes criterion and the discrimination from minimum classification error criterion. According to the relation between aggregate a posteriori probability and discriminant training, the closed-form solution of parameter estimation was obtained and it could accelerate the discriminant parameter estimation under proper simplification. The theoretical difference between the MAPLR and the method we proposed here could be established. In the experiments, TCC300 database was used to train the SI acoustic models and the prior distribution of regression matrix. In the parameter adaptation and testing, the TV broadcast news database collected by public television service foundation (PTS) was used to evaluate the adaptation performance. We evaluated the robustness of regression matrix adaptation using different number of adaptation utterances and compared the performance using different regression matrix adaptation algorithms, included MLLR, MAPLR, QBLR and MCELR.
|
author2 |
Jen-Tzung Chien |
author_facet |
Jen-Tzung Chien Yii-Kai Wang 王奕凱 |
author |
Yii-Kai Wang 王奕凱 |
spellingShingle |
Yii-Kai Wang 王奕凱 Aggregate A Posteriori Linear Regression for Speech Recognition |
author_sort |
Yii-Kai Wang |
title |
Aggregate A Posteriori Linear Regression for Speech Recognition |
title_short |
Aggregate A Posteriori Linear Regression for Speech Recognition |
title_full |
Aggregate A Posteriori Linear Regression for Speech Recognition |
title_fullStr |
Aggregate A Posteriori Linear Regression for Speech Recognition |
title_full_unstemmed |
Aggregate A Posteriori Linear Regression for Speech Recognition |
title_sort |
aggregate a posteriori linear regression for speech recognition |
publishDate |
2004 |
url |
http://ndltd.ncl.edu.tw/handle/22610583848346334523 |
work_keys_str_mv |
AT yiikaiwang aggregateaposteriorilinearregressionforspeechrecognition AT wángyìkǎi aggregateaposteriorilinearregressionforspeechrecognition AT yiikaiwang jùjíshìhòujīlǜxiànxìnghuíguīdiàoshìfǎyīngyòngyúyǔyīnbiànshí AT wángyìkǎi jùjíshìhòujīlǜxiànxìnghuíguīdiàoshìfǎyīngyòngyúyǔyīnbiànshí |
_version_ |
1718308420269899776 |