Summary: | 碩士 === 國立交通大學 === 電信工程研究所 === 104 === In this thesis, we mainly reaserch on two topics, one is to improve language models, including improving the parser, text normalization, lexicons and so on. We choose common words for training language models in order to reduce complexity, and discard some frequent but uncommon words. We convert language models into weighted finite state transducer and apply it to syllable sequence recognition, comparing to conventional recognition systems, the weighted finite state transducer is relatively small and efficient. Finally, we measure the performance of the language model by recognition rate and complexity.
In addition, we hope to extract more and deeper information of words from the text corpus, that is, we extract some word information from E-HowNet and training text corpus to assign each word (training example) a word vector, finding cosine similarity bwtween words and applying K-nearest neightbors algorithm to labeling each word one or more semantics. Besides, we discuss word information, the accuracy of word semantic labeling .etc further.
|