Summary: | 碩士 === 國立清華大學 === 資訊工程學系 === 90 === WordNet is a lexical database, which organizes English nouns, verbs, adjectives and adverbs according to word sense and relationship between senses. It has been applied increasingly to many knowledge-based NLP tasks as main lexical resource, because of it wide-coverage semantic and conceptual information. WordNets for many European languages other then English are being developed in recent years. This paper proposes an approach to semi-automatic construction of Chinese WordNet using a class-based statistical model.
Our approach to the problem of constructing Chinese WordNet is via translation of English WordNet. The main problem we have to tackle is to select the appropriate word translation for each word sense. We observe that English words for a common concept tend to have common Chinese characters in their translations. Our method consists of 1) classifying English words into several semantic classes and 2) building a class-based statistical model for estimating word translation probabilities. We have carried out experiments on handling nouns in the WordNet and evaluate our results based on coverage and recall rate.
The evaluation shows our approach can achieve 76.43% coverage. The recall rate is 70%, 80% and 90% when top 1, top 2, and top 3 translations are used respectively.
|