Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic

In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Perceptron tagger, to tag Icelandic, a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy, an unknown word guesser, we obtain state-of- the-art tagging accu...

Full description

Bibliographic Details
Main Author: Östling, Robert
Format: Others
Language:English
Published: Stockholms universitet, Avdelningen för datorlingvistik 2013
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-90304
id ndltd-UPSALLA1-oai-DiVA.org-su-90304
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-su-903042013-06-06T04:27:57ZTagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of IcelandicengÖstling, RobertStockholms universitet, Avdelningen för datorlingvistikLinköping University Electronic Press, Linköpings universitet2013part of speech taggingpos taggingicelandicIn this paper, we experiment with using Stagger, an open-source implementation of an Averaged Perceptron tagger, to tag Icelandic, a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy, an unknown word guesser, we obtain state-of- the-art tagging accuracy of 92.82%. Furthermore, by adding data from a morphological database, and word embeddings induced from an unannotated corpus, the accuracy increases to 93.84%. This is equivalent to an error reduction of 5.5%, compared to the previously best tagger for Icelandic, consisting of linguistic rules and a Hidden Markov Model. Conference paperinfo:eu-repo/semantics/conferenceObjecttexthttp://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-90304Linköping Electronic Conference Proceedings, 1650-3740Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), p. 105-119application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic part of speech tagging
pos tagging
icelandic
spellingShingle part of speech tagging
pos tagging
icelandic
Östling, Robert
Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
description In this paper, we experiment with using Stagger, an open-source implementation of an Averaged Perceptron tagger, to tag Icelandic, a morphologically complex language. By adding languagespecific linguistic features and using IceMorphy, an unknown word guesser, we obtain state-of- the-art tagging accuracy of 92.82%. Furthermore, by adding data from a morphological database, and word embeddings induced from an unannotated corpus, the accuracy increases to 93.84%. This is equivalent to an error reduction of 5.5%, compared to the previously best tagger for Icelandic, consisting of linguistic rules and a Hidden Markov Model.
author Östling, Robert
author_facet Östling, Robert
author_sort Östling, Robert
title Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
title_short Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
title_full Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
title_fullStr Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
title_full_unstemmed Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic
title_sort tagging a morphologically complex language using an averaged perceptron tagger: the case of icelandic
publisher Stockholms universitet, Avdelningen för datorlingvistik
publishDate 2013
url http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-90304
work_keys_str_mv AT ostlingrobert taggingamorphologicallycomplexlanguageusinganaveragedperceptrontaggerthecaseoficelandic
_version_ 1716588681611968512