Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?

We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to...

Full description

Bibliographic Details
Main Authors: Forcada Mikel L., Sánchez-Martínez Felipe, Esplà-Gomis Miquel, Specia Lucia
Format: Article
Language:English
Published: Sciendo 2017-06-01
Series:Prague Bulletin of Mathematical Linguistics
Online Access:https://doi.org/10.1515/pralin-2017-0019
id doaj-40f7759398754624ae8baf6dca0dbad5
record_format Article
spelling doaj-40f7759398754624ae8baf6dca0dbad52021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622017-06-01108118319510.1515/pralin-2017-0019pralin-2017-0019Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?Forcada Mikel L.0Sánchez-Martínez Felipe1Esplà-Gomis Miquel2Specia Lucia3Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartment of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, United Kingdom of Great Britain and Northern IrelandWe propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to minimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length and which may be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation to measure the effort needed to post-edit the output of an MT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts.https://doi.org/10.1515/pralin-2017-0019
collection DOAJ
language English
format Article
sources DOAJ
author Forcada Mikel L.
Sánchez-Martínez Felipe
Esplà-Gomis Miquel
Specia Lucia
spellingShingle Forcada Mikel L.
Sánchez-Martínez Felipe
Esplà-Gomis Miquel
Specia Lucia
Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
Prague Bulletin of Mathematical Linguistics
author_facet Forcada Mikel L.
Sánchez-Martínez Felipe
Esplà-Gomis Miquel
Specia Lucia
author_sort Forcada Mikel L.
title Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_short Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_full Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_fullStr Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_full_unstemmed Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_sort towards optimizing mt for post-editing effort: can bleu still be useful?
publisher Sciendo
series Prague Bulletin of Mathematical Linguistics
issn 1804-0462
publishDate 2017-06-01
description We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to minimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length and which may be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation to measure the effort needed to post-edit the output of an MT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts.
url https://doi.org/10.1515/pralin-2017-0019
work_keys_str_mv AT forcadamikell towardsoptimizingmtforposteditingeffortcanbleustillbeuseful
AT sanchezmartinezfelipe towardsoptimizingmtforposteditingeffortcanbleustillbeuseful
AT esplagomismiquel towardsoptimizingmtforposteditingeffortcanbleustillbeuseful
AT specialucia towardsoptimizingmtforposteditingeffortcanbleustillbeuseful
_version_ 1717812808714813440