Improved discriminative training for Mandarin continuous speech recognition
碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 95 === This thesis considers improved discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). First, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/53141182379423513068 |
id |
ndltd-TW-095NTNU5392013 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NTNU53920132016-05-23T04:17:33Z http://ndltd.ncl.edu.tw/handle/53141182379423513068 Improved discriminative training for Mandarin continuous speech recognition 改善鑑別式聲學模型訓練於中文連續語音辨識之研究 Shih-Hung Liu 劉士弘 碩士 國立臺灣師範大學 資訊工程研究所 95 This thesis considers improved discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). First, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of minimum phone error (MPE) training, which to some extent can sufficiently penalize deletion errors of speech recognition. Second, a novel data selection approach based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was explored for discriminative training. It has the merit of making the training algorithm focus much more on the training statistics of those frame samples that center nearly around the decision boundary for better discrimination. The proposed data selection approach was further applied to unsupervised discriminative training of acoustic models. Finally, a few other modifications of the training objective functions, as well as the lattice structures, for the accumulation of MPE training statistics were investigated. Experiments conducted on the Mandarin broadcast news corpus (MATBN) collected in Taiwan showed that the integration of the frame-level data selection and new phone accuracy function could achieve slight but consistent improvements over the conventional MPE training at lower training iterations. Berlin Chen 陳柏琳 2007 學位論文 ; thesis 123 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 95 === This thesis considers improved discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition (LVCSR). First, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of minimum phone error (MPE) training, which to some extent can sufficiently penalize deletion errors of speech recognition. Second, a novel data selection approach based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was explored for discriminative training. It has the merit of making the training algorithm focus much more on the training statistics of those frame samples that center nearly around the decision boundary for better discrimination. The proposed data selection approach was further applied to unsupervised discriminative training of acoustic models. Finally, a few other modifications of the training objective functions, as well as the lattice structures, for the accumulation of MPE training statistics were investigated. Experiments conducted on the Mandarin broadcast news corpus (MATBN) collected in Taiwan showed that the integration of the frame-level data selection and new phone accuracy function could achieve slight but consistent improvements over the conventional MPE training at lower training iterations.
|
author2 |
Berlin Chen |
author_facet |
Berlin Chen Shih-Hung Liu 劉士弘 |
author |
Shih-Hung Liu 劉士弘 |
spellingShingle |
Shih-Hung Liu 劉士弘 Improved discriminative training for Mandarin continuous speech recognition |
author_sort |
Shih-Hung Liu |
title |
Improved discriminative training for Mandarin continuous speech recognition |
title_short |
Improved discriminative training for Mandarin continuous speech recognition |
title_full |
Improved discriminative training for Mandarin continuous speech recognition |
title_fullStr |
Improved discriminative training for Mandarin continuous speech recognition |
title_full_unstemmed |
Improved discriminative training for Mandarin continuous speech recognition |
title_sort |
improved discriminative training for mandarin continuous speech recognition |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/53141182379423513068 |
work_keys_str_mv |
AT shihhungliu improveddiscriminativetrainingformandarincontinuousspeechrecognition AT liúshìhóng improveddiscriminativetrainingformandarincontinuousspeechrecognition AT shihhungliu gǎishànjiànbiéshìshēngxuémóxíngxùnliànyúzhōngwénliánxùyǔyīnbiànshízhīyánjiū AT liúshìhóng gǎishànjiànbiéshìshēngxuémóxíngxùnliànyúzhōngwénliánxùyǔyīnbiànshízhīyánjiū |
_version_ |
1718277985217282048 |