Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources

Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not ye...

Full description

Bibliographic Details
Main Authors: Paweł Kędzia, Maciej Piasecki, Marlena Orlińska
Format: Article
Language:English
Published: Institute of Slavic Studies, Polish Academy of Sciences 2015-12-01
Series:Cognitive Studies | Études cognitives
Subjects:
WSD
Online Access:https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170
id doaj-5d9df68ed02c4e1bbd7bc1010a577c3f
record_format Article
spelling doaj-5d9df68ed02c4e1bbd7bc1010a577c3f2020-11-24T23:33:51ZengInstitute of Slavic Studies, Polish Academy of SciencesCognitive Studies | Études cognitives2392-23972015-12-0101526929210.11649/cs.2015.019936Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesPaweł Kędzia0Maciej Piasecki1Marlena Orlińska2Politechnika Wrocławska [Wrocław University of Technology], WrocławPolitechnika Wrocławska [Wrocław University of Technology], WrocławPolitechnika Wrocławska [Wrocław University of Technology], WrocławWord Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170word sense disambiguationWSDpage rankplWordNetgraphslexical resourcesSUMO
collection DOAJ
language English
format Article
sources DOAJ
author Paweł Kędzia
Maciej Piasecki
Marlena Orlińska
spellingShingle Paweł Kędzia
Maciej Piasecki
Marlena Orlińska
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
Cognitive Studies | Études cognitives
word sense disambiguation
WSD
page rank
plWordNet
graphs
lexical resources
SUMO
author_facet Paweł Kędzia
Maciej Piasecki
Marlena Orlińska
author_sort Paweł Kędzia
title Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_short Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_full Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_fullStr Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_full_unstemmed Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_sort word sense disambiguation based on large scale polish clarin heterogeneous lexical resources
publisher Institute of Slavic Studies, Polish Academy of Sciences
series Cognitive Studies | Études cognitives
issn 2392-2397
publishDate 2015-12-01
description Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.
topic word sense disambiguation
WSD
page rank
plWordNet
graphs
lexical resources
SUMO
url https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170
work_keys_str_mv AT pawełkedzia wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources
AT maciejpiasecki wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources
AT marlenaorlinska wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources
_version_ 1725530756826005504