Summary: | Lexical knowledge bases, such as WordNet, have been shown to be useful in a wide range of language processing applications. However WordNet lacks certain information, such as topical relations between synsets. This thesis addresses this problem by enriching WordNet using information derived from Wikipedia. The approach consists of mapping concepts in WordNet to corresponding articles in Wikipedia. This is done using a three stage approach. First a set of possible candidate articles is retrieved for each WordNet concept. This is done by searching using the article title, and also by searching the full text using an IR engine. Secondly, text similarity scores are used to select the best match from the candidate articles. Finally, the mappings are refined using information from Wikipedia links to give a set of high quality matches. The mappings are evaluated using a manually annotated gold standard set of synset-article mappings. The annotation process indicates that the majority of synsets have a good matching article. The refined mappings are shown to have precision of 88.2\%. The mappings are then used to enrich relations in WordNet using Wikipedia links. The enriched WordNet is then used with a knowledge based Word Sense Disambiguation system. Evaluations are performed on the Semcor 3.0 corpus. Adding the new relations improves performance significantly over the WordNet baseline, demonstrating the usefulness of the mappings on an extrinsic task.
|