Coping with Variation in the Icelandic Diachronic Treebank

We present an overview of an ongoing project which has the aim of developing methods for building a treebank of Icelandic. The treebank will contain both written and spoken language, and in addition have a diachronic dimension. Since Icelandic is an example of what has been called a less-resourced l...

Full description

Bibliographic Details
Main Authors: Eiríkur Rögnvaldsson, Anton Karl Ingason, Einar Freyr Sigurðsson
Format: Article
Language:English
Published: University of Oslo 2011-06-01
Series:Oslo Studies in Language
Online Access:https://journals.uio.no/osla/article/view/104
Description
Summary:We present an overview of an ongoing project which has the aim of developing methods for building a treebank of Icelandic. The treebank will contain both written and spoken language, and in addition have a diachronic dimension. Since Icelandic is an example of what has been called a less-resourced language when it comes to computational linguistics and language technology, it is essential to utilize the limited resources available as economically and efficiently as possible. We emphasize the importance of open source software and the interplay between linguistic knowledge and technological skills. We describe the workflow in the construction of the treebank and show how the different software tools work together towards the final representation. Finally, we show how the treebank can be used in studying some well known phenomena in Icelandic syntax.
ISSN:1890-9639