Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources

Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not ye...

Full description

Bibliographic Details
Main Authors:	Paweł Kędzia, Maciej Piasecki, Marlena Orlińska
Format:	Article
Language:	English
Published:	Institute of Slavic Studies, Polish Academy of Sciences 2015-12-01
Series:	Cognitive Studies \| Études cognitives
Subjects:	word sense disambiguation WSD page rank plWordNet graphs lexical resources SUMO
Online Access:	https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170

id	doaj-5d9df68ed02c4e1bbd7bc1010a577c3f
record_format	Article
spelling	doaj-5d9df68ed02c4e1bbd7bc1010a577c3f2020-11-24T23:33:51ZengInstitute of Slavic Studies, Polish Academy of SciencesCognitive Studies \| Études cognitives2392-23972015-12-0101526929210.11649/cs.2015.019936Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesPaweł Kędzia0Maciej Piasecki1Marlena Orlińska2Politechnika Wrocławska [Wrocław University of Technology], WrocławPolitechnika Wrocławska [Wrocław University of Technology], WrocławPolitechnika Wrocławska [Wrocław University of Technology], WrocławWord Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170word sense disambiguationWSDpage rankplWordNetgraphslexical resourcesSUMO
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Paweł Kędzia Maciej Piasecki Marlena Orlińska
spellingShingle	Paweł Kędzia Maciej Piasecki Marlena Orlińska Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Cognitive Studies \| Études cognitives word sense disambiguation WSD page rank plWordNet graphs lexical resources SUMO
author_facet	Paweł Kędzia Maciej Piasecki Marlena Orlińska
author_sort	Paweł Kędzia
title	Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_short	Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_full	Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_fullStr	Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_full_unstemmed	Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
title_sort	word sense disambiguation based on large scale polish clarin heterogeneous lexical resources
publisher	Institute of Slavic Studies, Polish Academy of Sciences
series	Cognitive Studies \| Études cognitives
issn	2392-2397
publishDate	2015-12-01
description	Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.
topic	word sense disambiguation WSD page rank plWordNet graphs lexical resources SUMO
url	https://ispan.waw.pl/journals/index.php/cs-ec/article/view/1170
work_keys_str_mv	AT pawełkedzia wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources AT maciejpiasecki wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources AT marlenaorlinska wordsensedisambiguationbasedonlargescalepolishclarinheterogeneouslexicalresources
_version_	1725530756826005504

Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources

Similar Items