Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not ye...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Institute of Slavic Studies, Polish Academy of Sciences
2015-12-01
|
Series: | Cognitive Studies | Études cognitives |
Subjects: | |
Online Access: | https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170 |
id |
doaj-5d9df68ed02c4e1bbd7bc1010a577c3f |
---|---|
record_format |
Article |
spelling |
doaj-5d9df68ed02c4e1bbd7bc1010a577c3f2020-11-24T23:33:51ZengInstitute of Slavic Studies, Polish Academy of SciencesCognitive Studies | Études cognitives2392-23972015-12-0101526929210.11649/cs.2015.019936Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesPaweł Kędzia0Maciej Piasecki1Marlena Orlińska2Politechnika Wrocławska [Wrocław University of Technology], WrocławPolitechnika Wrocławska [Wrocław University of Technology], WrocławPolitechnika Wrocławska [Wrocław University of Technology], WrocławWord Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170word sense disambiguationWSDpage rankplWordNetgraphslexical resourcesSUMO |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Paweł Kędzia Maciej Piasecki Marlena Orlińska |
spellingShingle |
Paweł Kędzia Maciej Piasecki Marlena Orlińska Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Cognitive Studies | Études cognitives word sense disambiguation WSD page rank plWordNet graphs lexical resources SUMO |
author_facet |
Paweł Kędzia Maciej Piasecki Marlena Orlińska |
author_sort |
Paweł Kędzia |
title |
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources |
title_short |
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources |
title_full |
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources |
title_fullStr |
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources |
title_full_unstemmed |
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources |
title_sort |
word sense disambiguation based on large scale polish clarin heterogeneous lexical resources |
publisher |
Institute of Slavic Studies, Polish Academy of Sciences |
series |
Cognitive Studies | Études cognitives |
issn |
2392-2397 |
publishDate |
2015-12-01 |
description |
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed. |
topic |
word sense disambiguation WSD page rank plWordNet graphs lexical resources SUMO |
url |
https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170 |
work_keys_str_mv |
AT pawełkedzia wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources AT maciejpiasecki wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources AT marlenaorlinska wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources |
_version_ |
1725530756826005504 |