Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2017-06-01
|
Series: | Prague Bulletin of Mathematical Linguistics |
Online Access: | https://doi.org/10.1515/pralin-2017-0019 |
id |
doaj-40f7759398754624ae8baf6dca0dbad5 |
---|---|
record_format |
Article |
spelling |
doaj-40f7759398754624ae8baf6dca0dbad52021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622017-06-01108118319510.1515/pralin-2017-0019pralin-2017-0019Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?Forcada Mikel L.0Sánchez-Martínez Felipe1Esplà-Gomis Miquel2Specia Lucia3Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartment of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, United Kingdom of Great Britain and Northern IrelandWe propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to minimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length and which may be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation to measure the effort needed to post-edit the output of an MT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts.https://doi.org/10.1515/pralin-2017-0019 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Forcada Mikel L. Sánchez-Martínez Felipe Esplà-Gomis Miquel Specia Lucia |
spellingShingle |
Forcada Mikel L. Sánchez-Martínez Felipe Esplà-Gomis Miquel Specia Lucia Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? Prague Bulletin of Mathematical Linguistics |
author_facet |
Forcada Mikel L. Sánchez-Martínez Felipe Esplà-Gomis Miquel Specia Lucia |
author_sort |
Forcada Mikel L. |
title |
Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? |
title_short |
Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? |
title_full |
Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? |
title_fullStr |
Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? |
title_full_unstemmed |
Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? |
title_sort |
towards optimizing mt for post-editing effort: can bleu still be useful? |
publisher |
Sciendo |
series |
Prague Bulletin of Mathematical Linguistics |
issn |
1804-0462 |
publishDate |
2017-06-01 |
description |
We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to minimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length and which may be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation to measure the effort needed to post-edit the output of an MT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts. |
url |
https://doi.org/10.1515/pralin-2017-0019 |
work_keys_str_mv |
AT forcadamikell towardsoptimizingmtforposteditingeffortcanbleustillbeuseful AT sanchezmartinezfelipe towardsoptimizingmtforposteditingeffortcanbleustillbeuseful AT esplagomismiquel towardsoptimizingmtforposteditingeffortcanbleustillbeuseful AT specialucia towardsoptimizingmtforposteditingeffortcanbleustillbeuseful |
_version_ |
1717812808714813440 |