Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription

碩士 === 國立臺北科技大學 === 電子工程系研究所 === 105 === It is difficult to collect enough labeled data to well train a high performance Automatic Speech Recognition. However, we can easily obtain unlimited amount of unlabeled speech data. In order to gain this advantage, the objective of this paper to use semi-sup...

Full description

Bibliographic Details
Main Authors: Sing-Yue Wang, 王星月
Other Authors: 廖元甫
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/v665zj
id ndltd-TW-105TIT05427106
record_format oai_dc
spelling ndltd-TW-105TIT054271062019-05-15T23:53:44Z http://ndltd.ncl.edu.tw/handle/v665zj Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription 基於半監督式學習之廣播節目語音逐字稿自動轉寫系統 Sing-Yue Wang 王星月 碩士 國立臺北科技大學 電子工程系研究所 105 It is difficult to collect enough labeled data to well train a high performance Automatic Speech Recognition. However, we can easily obtain unlimited amount of unlabeled speech data. In order to gain this advantage, the objective of this paper to use semi-supervised training to improve ASR’s performance. We using Quality Estimation to predict utterance WER. Then a subset of the unlabeled speech utterance which is predicted to have good recognition quality was added into the training data of the speech recognizer and retrain acoustic model. In experimental results, we evaluate two test data set of broadcast materials. The CER could be reduced from 25.00% to 23.61% and from 14.24% to 13.24% with QE-based data selection methods. We also retrain language model with Giga Word, the CER could be reduced from 23.61% to 23.25% and from 13.24% to 12.63%. Finally, we implement online Automatic Radio Transcriber provides speech recognition service. 廖元甫 2017 學位論文 ; thesis 76 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 電子工程系研究所 === 105 === It is difficult to collect enough labeled data to well train a high performance Automatic Speech Recognition. However, we can easily obtain unlimited amount of unlabeled speech data. In order to gain this advantage, the objective of this paper to use semi-supervised training to improve ASR’s performance. We using Quality Estimation to predict utterance WER. Then a subset of the unlabeled speech utterance which is predicted to have good recognition quality was added into the training data of the speech recognizer and retrain acoustic model. In experimental results, we evaluate two test data set of broadcast materials. The CER could be reduced from 25.00% to 23.61% and from 14.24% to 13.24% with QE-based data selection methods. We also retrain language model with Giga Word, the CER could be reduced from 23.61% to 23.25% and from 13.24% to 12.63%. Finally, we implement online Automatic Radio Transcriber provides speech recognition service.
author2 廖元甫
author_facet 廖元甫
Sing-Yue Wang
王星月
author Sing-Yue Wang
王星月
spellingShingle Sing-Yue Wang
王星月
Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription
author_sort Sing-Yue Wang
title Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription
title_short Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription
title_full Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription
title_fullStr Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription
title_full_unstemmed Speech Recognition Quality Estimation-based Semi-Supervised Training for Broadcast Radio Program Transcription
title_sort speech recognition quality estimation-based semi-supervised training for broadcast radio program transcription
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/v665zj
work_keys_str_mv AT singyuewang speechrecognitionqualityestimationbasedsemisupervisedtrainingforbroadcastradioprogramtranscription
AT wángxīngyuè speechrecognitionqualityestimationbasedsemisupervisedtrainingforbroadcastradioprogramtranscription
AT singyuewang jīyúbànjiāndūshìxuéxízhīguǎngbōjiémùyǔyīnzhúzìgǎozìdòngzhuǎnxiěxìtǒng
AT wángxīngyuè jīyúbànjiāndūshìxuéxízhīguǎngbōjiémùyǔyīnzhúzìgǎozìdòngzhuǎnxiěxìtǒng
_version_ 1719156701604610048