Improved Neural Network Based Acoustic Modeling Leveraging Multi-task Learning and Ensemble Learning for Meeting Speech Recognition

碩士 === 國立臺灣師範大學 === 資訊工程學系 === 104 === This thesis sets out to explore the use of multi-task learning (MTL) and ensemble learning techniques for more accurate estimation of the parameters involved in neural network based acoustic models, so as to improve the accuracy of meeting speech recognition. O...

Full description

Bibliographic Details
Main Authors: Yang, Ming-Han, 楊明翰
Other Authors: Chen, Berlin
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/99106158778715235461
Description
Summary:碩士 === 國立臺灣師範大學 === 資訊工程學系 === 104 === This thesis sets out to explore the use of multi-task learning (MTL) and ensemble learning techniques for more accurate estimation of the parameters involved in neural network based acoustic models, so as to improve the accuracy of meeting speech recognition. Our main contributions are three-fold. First, we conduct an empirical study to leverage various auxiliary tasks to enhance the performance of multi-task learning on meeting speech recognition. Furthermore, we also study the synergy effect of combing multi-task learning with disparate acoustic models, such as deep neural network (DNN) and convolutional neural network (CNN) based acoustic models, with the expectation to increase the generalization ability of acoustic modeling. Second, since the way to modulate the contribution (weights) of different auxiliary tasks during acoustic model training is far from optimal and actually a matter of heuristic judgment, we thus propose a simple model adaptation method to alleviate such a problem. Third, an ensemble learning method is investigated to systematically integrate the various acoustic models (weak learners) trained with multi-task learning. A series of experiments have been carried out on the augmented multi-party interaction (AMI) and Mandarin meeting recording (MMRC) corpora, which seem to reveal the effectiveness of our proposed methods in relation to several existing baselines.