Summary: | 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 88 === This paper proposes a sense tagger for Mandarin Chinese. Using contextual information and the mapping from WordNet synsets to Cilin sense tags to deal with word sense disambiguation. The performance for tagging low(2-4), middle(5-8) and high(>8) ambiguous words is 63.36% in average, when small categories(1428 senses) are used and 1-3 candidates are proposed, respectively. The performance of tagging unknown words is 34.35%, which is better than that of the baseline model. This sense tagger helps us set up a large-scale sense-tagged corpus from ASBC.
This paper also proposes a method to construct Chinese-English WordNet automatically. According to the word senses, Chinese words are mapped to the WordNet synsets. Besides the mapping between Chinese Cilin senses and English WordNet synsets is built, we also set up a Chinese lexical knowlege base. The results are applied to Chinese-English information retrieval. When the Chinese-English WordNet is applied to our CLIR experiment, it achieves 69.7% of monolingual IR effectiveness.
|