Summary: | 碩士 === 國立交通大學 === 電信工程研究所 === 105 === This thesis proposed a 2-stage automatic segmentation method, using database available to train traditional GMM-HMM acoustics model and GMM-based boundary model, aimed for processing syllable-level segmental boundaries of a new target database automatically. We got the initial syllable-level boundaries information by HMM-based forced alignment at the first stage, and then introduce boundary model to do post-refinement upon each boundary within a local range at second stage. A small number of utterances were treated as adaptation data for speaker adaptive training of boundary model so that the statistics of model parameters can match that of the test data, which would enhance the segmental refinement. In the experiment, lecture videos and captions from National Chiao Tung University Open Course Website (NCTU OCW) were choosen as the source of target database, while TCC300 training set was used for training GMM-HMM baseline model; Fast brodacast read speech database and part of the OCW training set was used for boundary model training, including background and speaker adaptation. By this, we would develop a highly-automatic syllable-level segmental boundary labeling system.
|