LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity

We present the first large-coverage finite-state open-source morphology for Latin (called LatMor) which parses as well as generates vowel quantity information. LatMor is based on the Berlin Latin Lexicon comprising about 70,000 lemmata of classical Latin compiled by the group of Dietmar Najock in th...

Full description

Bibliographic Details
Main Authors: Springmann Uwe, Schmid Helmut, Najock Dietmar
Format: Article
Language:English
Published: De Gruyter 2016-10-01
Series:Open Linguistics
Subjects:
Online Access:http://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0019/opli-2016-0019.xml?format=INT
id doaj-e37e8eb68c8f417c8cd5c31684105384
record_format Article
spelling doaj-e37e8eb68c8f417c8cd5c316841053842021-10-02T05:51:48ZengDe GruyterOpen Linguistics2300-99692016-10-012110.1515/opli-2016-0019opli-2016-0019LatMor: A Latin Finite-State Morphology Encoding Vowel QuantitySpringmann Uwe0Schmid Helmut1Najock Dietmar2Centrum für Informations- und Sprachverarbeitung, Ludwig-Maximilians-Universität MünchenCentrum für Informations- und Sprachverarbeitung, Ludwig-Maximilians-Universität MünchenInstitut für Griechische und Lateinische Philologie, Freie Universität BerlinWe present the first large-coverage finite-state open-source morphology for Latin (called LatMor) which parses as well as generates vowel quantity information. LatMor is based on the Berlin Latin Lexicon comprising about 70,000 lemmata of classical Latin compiled by the group of Dietmar Najock in theirwork on concordances of Latin authors (see Rapsch and Najock, 1991) which was recently updated by us. Compared to the well-known Morpheus system of Crane (1991, 1998), which is written in the C programming language, based on 50,000 lemmata of Lewis and Short (1907), not well documented and therefore not easily extended, our new morphology has a larger vocabulary, is about 60 to 1200 times faster and is built in the form of finite-state transducers which can analyze as well as generate wordforms and represent the state-of-the-art implementation method in computational morphology. The current coverage of LatMor is evaluated against Morpheus and other existing systems (some of which are not openly accessible), and is shown to rank first among all systems together with the Pisa LEMLAT morphology (not yet openly accessible). Recall has been analyzed taking the Latin Dependency Treebank¹ as gold data and the remaining defect classes have been identified. LatMor is available under an open source licence to allow its wide usage by all interested parties.http://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0019/opli-2016-0019.xml?format=INTmorphology finite state methods Latin historical linguistics
collection DOAJ
language English
format Article
sources DOAJ
author Springmann Uwe
Schmid Helmut
Najock Dietmar
spellingShingle Springmann Uwe
Schmid Helmut
Najock Dietmar
LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
Open Linguistics
morphology
finite state methods
Latin
historical linguistics
author_facet Springmann Uwe
Schmid Helmut
Najock Dietmar
author_sort Springmann Uwe
title LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
title_short LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
title_full LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
title_fullStr LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
title_full_unstemmed LatMor: A Latin Finite-State Morphology Encoding Vowel Quantity
title_sort latmor: a latin finite-state morphology encoding vowel quantity
publisher De Gruyter
series Open Linguistics
issn 2300-9969
publishDate 2016-10-01
description We present the first large-coverage finite-state open-source morphology for Latin (called LatMor) which parses as well as generates vowel quantity information. LatMor is based on the Berlin Latin Lexicon comprising about 70,000 lemmata of classical Latin compiled by the group of Dietmar Najock in theirwork on concordances of Latin authors (see Rapsch and Najock, 1991) which was recently updated by us. Compared to the well-known Morpheus system of Crane (1991, 1998), which is written in the C programming language, based on 50,000 lemmata of Lewis and Short (1907), not well documented and therefore not easily extended, our new morphology has a larger vocabulary, is about 60 to 1200 times faster and is built in the form of finite-state transducers which can analyze as well as generate wordforms and represent the state-of-the-art implementation method in computational morphology. The current coverage of LatMor is evaluated against Morpheus and other existing systems (some of which are not openly accessible), and is shown to rank first among all systems together with the Pisa LEMLAT morphology (not yet openly accessible). Recall has been analyzed taking the Latin Dependency Treebank¹ as gold data and the remaining defect classes have been identified. LatMor is available under an open source licence to allow its wide usage by all interested parties.
topic morphology
finite state methods
Latin
historical linguistics
url http://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0019/opli-2016-0019.xml?format=INT
work_keys_str_mv AT springmannuwe latmoralatinfinitestatemorphologyencodingvowelquantity
AT schmidhelmut latmoralatinfinitestatemorphologyencodingvowelquantity
AT najockdietmar latmoralatinfinitestatemorphologyencodingvowelquantity
_version_ 1716858567513866240