Grounding sloWNet on Slovene corpus data
Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is w...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
2013-12-01
|
Series: | Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave |
Subjects: | |
Online Access: | http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_05.pdf |
id |
doaj-ffc1c8833ce844839bcad79f7fa937b4 |
---|---|
record_format |
Article |
spelling |
doaj-ffc1c8833ce844839bcad79f7fa937b42021-04-02T06:07:15ZengZnanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362013-12-011282112Grounding sloWNet on Slovene corpus dataDarja Fišer0Maciej Piasecki1Bartosz Broda2Faculty of Arts, LjubljanaInstitute of Informatics, WroclawInstitute of Informatics, WroclawWordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network.http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_05.pdflexical semanticswordnetsemantic similarity |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Darja Fišer Maciej Piasecki Bartosz Broda |
spellingShingle |
Darja Fišer Maciej Piasecki Bartosz Broda Grounding sloWNet on Slovene corpus data Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave lexical semantics wordnet semantic similarity |
author_facet |
Darja Fišer Maciej Piasecki Bartosz Broda |
author_sort |
Darja Fišer |
title |
Grounding sloWNet on Slovene corpus data |
title_short |
Grounding sloWNet on Slovene corpus data |
title_full |
Grounding sloWNet on Slovene corpus data |
title_fullStr |
Grounding sloWNet on Slovene corpus data |
title_full_unstemmed |
Grounding sloWNet on Slovene corpus data |
title_sort |
grounding slownet on slovene corpus data |
publisher |
Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts) |
series |
Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave |
issn |
2335-2736 |
publishDate |
2013-12-01 |
description |
Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network. |
topic |
lexical semantics wordnet semantic similarity |
url |
http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_05.pdf |
work_keys_str_mv |
AT darjafiser groundingslownetonslovenecorpusdata AT maciejpiasecki groundingslownetonslovenecorpusdata AT bartoszbroda groundingslownetonslovenecorpusdata |
_version_ |
1724172146801901568 |