Summary: | 碩士 === 國立清華大學 === 統計學研究所 === 92 === We compare in this thesis the performance of a speech recognition system trained with two speech corpora. From the dictionary of the Daiim input method, we select two set of words such that they covered all the cross-syllable biphones and triphones, and are called biphone-rich and triphone-rich respectively. It is found that a complete coverage of the cross-syllable triphones requires words of about ten times than that of cross-syllable biphones. To facilitate fair comparison, the biphone-rich corpus is thus consisted of ten sets of words that each covers all the cross-syllable biphones. It is interesting to note that the triphone coverage of this biphone-rich corpus is much lower than that of the triphone-rich set.
With those words as transcript, a male Taiwanese speaker recorded all the words as microphone speech. The resulting speech corpora, about 100 minutes for each set, are used to train for the acoustic models. Although both perform quite well in tasks with recognition networks of linear net and free syllable net, the triphone-rich corpus does not show advantages over the biphone-rich corpus.
|