Integration of Acoustic and Linguistic Features for Maximum Entropy Speech Recognition

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 93 === In traditional speech recognition system, we assume that acoustic and linguistic information sources are independent. Parameters of acoustic hidden Markov model (HMM) and linguistic n-gram model are estimated individually and then combined together to build a...

Full description

Bibliographic Details
Main Authors: To-Chang Chien, 錢鐸樟
Other Authors: Jen-Tzung Chien
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/24325293971312481529
Description
Summary:碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 93 === In traditional speech recognition system, we assume that acoustic and linguistic information sources are independent. Parameters of acoustic hidden Markov model (HMM) and linguistic n-gram model are estimated individually and then combined together to build a plug-in maximum a posteriori (MAP) classification rule. However, the acoustic model and language model are correlated in essence. We should relax the independence assumption so as to improve speech recognition performance. In this study, we propose an integrated approach based on maximum entropy (ME) principle where acoustic and linguistic features are optimally combined in an unified framework. Using this approach, the associations between acoustic and linguistic features are explored and merged in the integrated models. On the issue of discriminative training, we also establish the relationship between ME and discriminative maximum mutual information (MMI) models. In addition, this ME integrated model is general so that the semantic topics and long distance association patterns can be further combined. In the experiments, we carry out the proposed ME model for broadcast news transcription using MATBN database. In preliminary experimental results, we obtain improvement compared to conventional speech recognition system based on plug-in MAP classification rule.