Summary: | 碩士 === 國立臺灣師範大學 === 資訊工程學系 === 104 === This thesis sets out to explore the use of multi-task learning (MTL) and ensemble learning techniques for more accurate estimation of the parameters involved in neural network based acoustic models, so as to improve the accuracy of meeting speech recognition. Our main contributions are three-fold. First, we conduct an empirical study to leverage various auxiliary tasks to enhance the performance of multi-task learning on meeting speech recognition. Furthermore, we also study the synergy effect of combing multi-task learning with disparate acoustic models, such as deep neural network (DNN) and convolutional neural network (CNN) based acoustic models, with the expectation to increase the generalization ability of acoustic modeling. Second, since the way to modulate the contribution (weights) of different auxiliary tasks during acoustic model training is far from optimal and actually a matter of heuristic judgment, we thus propose a simple model adaptation method to alleviate such a problem. Third, an ensemble learning method is investigated to systematically integrate the various acoustic models (weak learners) trained with multi-task learning. A series of experiments have been carried out on the augmented multi-party interaction (AMI) and Mandarin meeting recording (MMRC) corpora, which seem to reveal the effectiveness of our proposed methods in relation to several existing baselines.
|