LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
We present the first large-coverage finite-state open-source morphology for Latin (called LatMor) which parses as well as generates vowel quantity information. LatMor is based on the Berlin Latin Lexicon comprising about 70,000 lemmata of classical Latin compiled by the group of Dietmar Najock in th...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2016-10-01
|
Series: | Open Linguistics |
Subjects: | |
Online Access: | http://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0019/opli-2016-0019.xml?format=INT |
id |
doaj-e37e8eb68c8f417c8cd5c31684105384 |
---|---|
record_format |
Article |
spelling |
doaj-e37e8eb68c8f417c8cd5c316841053842021-10-02T05:51:48ZengDe GruyterOpen Linguistics2300-99692016-10-012110.1515/opli-2016-0019opli-2016-0019LatMor: A Latin Finite-State Morphology Encoding Vowel QuantitySpringmann Uwe0Schmid Helmut1Najock Dietmar2Centrum für Informations- und Sprachverarbeitung, Ludwig-Maximilians-Universität MünchenCentrum für Informations- und Sprachverarbeitung, Ludwig-Maximilians-Universität MünchenInstitut für Griechische und Lateinische Philologie, Freie Universität BerlinWe present the first large-coverage finite-state open-source morphology for Latin (called LatMor) which parses as well as generates vowel quantity information. LatMor is based on the Berlin Latin Lexicon comprising about 70,000 lemmata of classical Latin compiled by the group of Dietmar Najock in theirwork on concordances of Latin authors (see Rapsch and Najock, 1991) which was recently updated by us. Compared to the well-known Morpheus system of Crane (1991, 1998), which is written in the C programming language, based on 50,000 lemmata of Lewis and Short (1907), not well documented and therefore not easily extended, our new morphology has a larger vocabulary, is about 60 to 1200 times faster and is built in the form of finite-state transducers which can analyze as well as generate wordforms and represent the state-of-the-art implementation method in computational morphology. The current coverage of LatMor is evaluated against Morpheus and other existing systems (some of which are not openly accessible), and is shown to rank first among all systems together with the Pisa LEMLAT morphology (not yet openly accessible). Recall has been analyzed taking the Latin Dependency Treebank¹ as gold data and the remaining defect classes have been identified. LatMor is available under an open source licence to allow its wide usage by all interested parties.http://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0019/opli-2016-0019.xml?format=INTmorphology finite state methods Latin historical linguistics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Springmann Uwe Schmid Helmut Najock Dietmar |
spellingShingle |
Springmann Uwe Schmid Helmut Najock Dietmar LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity Open Linguistics morphology finite state methods Latin historical linguistics |
author_facet |
Springmann Uwe Schmid Helmut Najock Dietmar |
author_sort |
Springmann Uwe |
title |
LatMor: A Latin Finite-State Morphology
Encoding Vowel Quantity |
title_short |
LatMor: A Latin Finite-State Morphology
Encoding Vowel Quantity |
title_full |
LatMor: A Latin Finite-State Morphology
Encoding Vowel Quantity |
title_fullStr |
LatMor: A Latin Finite-State Morphology
Encoding Vowel Quantity |
title_full_unstemmed |
LatMor: A Latin Finite-State Morphology
Encoding Vowel Quantity |
title_sort |
latmor: a latin finite-state morphology
encoding vowel quantity |
publisher |
De Gruyter |
series |
Open Linguistics |
issn |
2300-9969 |
publishDate |
2016-10-01 |
description |
We present the first large-coverage finite-state open-source morphology for Latin (called LatMor)
which parses as well as generates vowel quantity information. LatMor is based on the Berlin Latin Lexicon
comprising about 70,000 lemmata of classical Latin compiled by the group of Dietmar Najock in theirwork on
concordances of Latin authors (see Rapsch and Najock, 1991) which was recently updated by us. Compared
to the well-known Morpheus system of Crane (1991, 1998), which is written in the C programming language,
based on 50,000 lemmata of Lewis and Short (1907), not well documented and therefore not easily extended,
our new morphology has a larger vocabulary, is about 60 to 1200 times faster and is built in the form of
finite-state transducers which can analyze as well as generate wordforms and represent the state-of-the-art
implementation method in computational morphology. The current coverage of LatMor is evaluated against
Morpheus and other existing systems (some of which are not openly accessible), and is shown to rank first
among all systems together with the Pisa LEMLAT morphology (not yet openly accessible). Recall has been
analyzed taking the Latin Dependency Treebank¹ as gold data and the remaining defect classes have been
identified. LatMor is available under an open source licence to allow its wide usage by all interested parties. |
topic |
morphology finite state methods Latin historical linguistics |
url |
http://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0019/opli-2016-0019.xml?format=INT |
work_keys_str_mv |
AT springmannuwe latmoralatinfinitestatemorphologyencodingvowelquantity AT schmidhelmut latmoralatinfinitestatemorphologyencodingvowelquantity AT najockdietmar latmoralatinfinitestatemorphologyencodingvowelquantity |
_version_ |
1716858567513866240 |