RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation

This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pa...

Full description

Bibliographic Details
Main Authors: Sánchez-Cartagena Víctor M., Pérez-Ortiz Juan Antonio, Sánchez-Martínez Felipe
Format: Article
Language:English
Published: Sciendo 2016-10-01
Series:Prague Bulletin of Mathematical Linguistics
Online Access:https://doi.org/10.1515/pralin-2016-0018
id doaj-98d0d649c90f44769128720f137951c9
record_format Article
spelling doaj-98d0d649c90f44769128720f137951c92021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622016-10-01106119320410.1515/pralin-2016-0018pralin-2016-0018RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine TranslationSánchez-Cartagena Víctor M.0Pérez-Ortiz Juan Antonio1Sánchez-Martínez Felipe2Prompsit Language Engineering, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, SpainThis paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in contrast to statistical machine translation, a small amount of parallel corpora (a few hundred parallel sentences proved to be sufficient). The inference algorithm implemented by ruLearn has been recently published by the same authors in Computer Speech & Language (volume 32). It is able to produce rules whose translation quality is similar to that obtained by using hand-crafted rules. ruLearn generates rules that are ready for their use in the Apertium platform, although they can be easily adapted to other platforms. When the rules produced by ruLearn are used together with a hybridisation strategy for integrating linguistic resources from shallow-transfer rule-based machine translation into phrase-based statistical machine translation (published by the same authors in Journal of Artificial Intelligence Research, volume 55), they help to mitigate data sparseness. This paper also shows how to use ruLearn and describes its implementation.https://doi.org/10.1515/pralin-2016-0018
collection DOAJ
language English
format Article
sources DOAJ
author Sánchez-Cartagena Víctor M.
Pérez-Ortiz Juan Antonio
Sánchez-Martínez Felipe
spellingShingle Sánchez-Cartagena Víctor M.
Pérez-Ortiz Juan Antonio
Sánchez-Martínez Felipe
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
Prague Bulletin of Mathematical Linguistics
author_facet Sánchez-Cartagena Víctor M.
Pérez-Ortiz Juan Antonio
Sánchez-Martínez Felipe
author_sort Sánchez-Cartagena Víctor M.
title RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
title_short RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
title_full RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
title_fullStr RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
title_full_unstemmed RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
title_sort rulearn: an open-source toolkit for the automatic inference of shallow-transfer rules for machine translation
publisher Sciendo
series Prague Bulletin of Mathematical Linguistics
issn 1804-0462
publishDate 2016-10-01
description This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in contrast to statistical machine translation, a small amount of parallel corpora (a few hundred parallel sentences proved to be sufficient). The inference algorithm implemented by ruLearn has been recently published by the same authors in Computer Speech & Language (volume 32). It is able to produce rules whose translation quality is similar to that obtained by using hand-crafted rules. ruLearn generates rules that are ready for their use in the Apertium platform, although they can be easily adapted to other platforms. When the rules produced by ruLearn are used together with a hybridisation strategy for integrating linguistic resources from shallow-transfer rule-based machine translation into phrase-based statistical machine translation (published by the same authors in Journal of Artificial Intelligence Research, volume 55), they help to mitigate data sparseness. This paper also shows how to use ruLearn and describes its implementation.
url https://doi.org/10.1515/pralin-2016-0018
work_keys_str_mv AT sanchezcartagenavictorm rulearnanopensourcetoolkitfortheautomaticinferenceofshallowtransferrulesformachinetranslation
AT perezortizjuanantonio rulearnanopensourcetoolkitfortheautomaticinferenceofshallowtransferrulesformachinetranslation
AT sanchezmartinezfelipe rulearnanopensourcetoolkitfortheautomaticinferenceofshallowtransferrulesformachinetranslation
_version_ 1717812837082988544