RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2016-10-01
|
Series: | Prague Bulletin of Mathematical Linguistics |
Online Access: | https://doi.org/10.1515/pralin-2016-0018 |
id |
doaj-98d0d649c90f44769128720f137951c9 |
---|---|
record_format |
Article |
spelling |
doaj-98d0d649c90f44769128720f137951c92021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622016-10-01106119320410.1515/pralin-2016-0018pralin-2016-0018RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine TranslationSánchez-Cartagena Víctor M.0Pérez-Ortiz Juan Antonio1Sánchez-Martínez Felipe2Prompsit Language Engineering, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, SpainThis paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in contrast to statistical machine translation, a small amount of parallel corpora (a few hundred parallel sentences proved to be sufficient). The inference algorithm implemented by ruLearn has been recently published by the same authors in Computer Speech & Language (volume 32). It is able to produce rules whose translation quality is similar to that obtained by using hand-crafted rules. ruLearn generates rules that are ready for their use in the Apertium platform, although they can be easily adapted to other platforms. When the rules produced by ruLearn are used together with a hybridisation strategy for integrating linguistic resources from shallow-transfer rule-based machine translation into phrase-based statistical machine translation (published by the same authors in Journal of Artificial Intelligence Research, volume 55), they help to mitigate data sparseness. This paper also shows how to use ruLearn and describes its implementation.https://doi.org/10.1515/pralin-2016-0018 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sánchez-Cartagena Víctor M. Pérez-Ortiz Juan Antonio Sánchez-Martínez Felipe |
spellingShingle |
Sánchez-Cartagena Víctor M. Pérez-Ortiz Juan Antonio Sánchez-Martínez Felipe RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation Prague Bulletin of Mathematical Linguistics |
author_facet |
Sánchez-Cartagena Víctor M. Pérez-Ortiz Juan Antonio Sánchez-Martínez Felipe |
author_sort |
Sánchez-Cartagena Víctor M. |
title |
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation |
title_short |
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation |
title_full |
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation |
title_fullStr |
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation |
title_full_unstemmed |
RuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation |
title_sort |
rulearn: an open-source toolkit for the automatic inference of shallow-transfer rules for machine translation |
publisher |
Sciendo |
series |
Prague Bulletin of Mathematical Linguistics |
issn |
1804-0462 |
publishDate |
2016-10-01 |
description |
This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in contrast to statistical machine translation, a small amount of parallel corpora (a few hundred parallel sentences proved to be sufficient). The inference algorithm implemented by ruLearn has been recently published by the same authors in Computer Speech & Language (volume 32). It is able to produce rules whose translation quality is similar to that obtained by using hand-crafted rules. ruLearn generates rules that are ready for their use in the Apertium platform, although they can be easily adapted to other platforms. When the rules produced by ruLearn are used together with a hybridisation strategy for integrating linguistic resources from shallow-transfer rule-based machine translation into phrase-based statistical machine translation (published by the same authors in Journal of Artificial Intelligence Research, volume 55), they help to mitigate data sparseness. This paper also shows how to use ruLearn and describes its implementation. |
url |
https://doi.org/10.1515/pralin-2016-0018 |
work_keys_str_mv |
AT sanchezcartagenavictorm rulearnanopensourcetoolkitfortheautomaticinferenceofshallowtransferrulesformachinetranslation AT perezortizjuanantonio rulearnanopensourcetoolkitfortheautomaticinferenceofshallowtransferrulesformachinetranslation AT sanchezmartinezfelipe rulearnanopensourcetoolkitfortheautomaticinferenceofshallowtransferrulesformachinetranslation |
_version_ |
1717812837082988544 |