Grounding sloWNet on Slovene corpus data

Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is w...

Full description

Bibliographic Details
Main Authors: Darja Fišer, Maciej Piasecki, Bartosz Broda
Format: Article
Language:English
Published: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts) 2013-12-01
Series:Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:
Online Access:http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_05.pdf
id doaj-ffc1c8833ce844839bcad79f7fa937b4
record_format Article
spelling doaj-ffc1c8833ce844839bcad79f7fa937b42021-04-02T06:07:15ZengZnanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362013-12-011282112Grounding sloWNet on Slovene corpus dataDarja Fišer0Maciej Piasecki1Bartosz Broda2Faculty of Arts, LjubljanaInstitute of Informatics, WroclawInstitute of Informatics, WroclawWordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network.http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_05.pdflexical semanticswordnetsemantic similarity
collection DOAJ
language English
format Article
sources DOAJ
author Darja Fišer
Maciej Piasecki
Bartosz Broda
spellingShingle Darja Fišer
Maciej Piasecki
Bartosz Broda
Grounding sloWNet on Slovene corpus data
Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
lexical semantics
wordnet
semantic similarity
author_facet Darja Fišer
Maciej Piasecki
Bartosz Broda
author_sort Darja Fišer
title Grounding sloWNet on Slovene corpus data
title_short Grounding sloWNet on Slovene corpus data
title_full Grounding sloWNet on Slovene corpus data
title_fullStr Grounding sloWNet on Slovene corpus data
title_full_unstemmed Grounding sloWNet on Slovene corpus data
title_sort grounding slownet on slovene corpus data
publisher Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
series Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
issn 2335-2736
publishDate 2013-12-01
description Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network.
topic lexical semantics
wordnet
semantic similarity
url http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_05.pdf
work_keys_str_mv AT darjafiser groundingslownetonslovenecorpusdata
AT maciejpiasecki groundingslownetonslovenecorpusdata
AT bartoszbroda groundingslownetonslovenecorpusdata
_version_ 1724172146801901568