Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?

We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to...

Full description

Bibliographic Details
Main Authors:	Forcada Mikel L., Sánchez-Martínez Felipe, Esplà-Gomis Miquel, Specia Lucia
Format:	Article
Language:	English
Published:	Sciendo 2017-06-01
Series:	Prague Bulletin of Mathematical Linguistics
Online Access:	https://doi.org/10.1515/pralin-2017-0019

id	doaj-40f7759398754624ae8baf6dca0dbad5
record_format	Article
spelling	doaj-40f7759398754624ae8baf6dca0dbad52021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622017-06-01108118319510.1515/pralin-2017-0019pralin-2017-0019Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?Forcada Mikel L.0Sánchez-Martínez Felipe1Esplà-Gomis Miquel2Specia Lucia3Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03690 Sant Vicent del Raspeig, SpainDepartment of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, United Kingdom of Great Britain and Northern IrelandWe propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to minimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length and which may be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation to measure the effort needed to post-edit the output of an MT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts.https://doi.org/10.1515/pralin-2017-0019
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Forcada Mikel L. Sánchez-Martínez Felipe Esplà-Gomis Miquel Specia Lucia
spellingShingle	Forcada Mikel L. Sánchez-Martínez Felipe Esplà-Gomis Miquel Specia Lucia Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful? Prague Bulletin of Mathematical Linguistics
author_facet	Forcada Mikel L. Sánchez-Martínez Felipe Esplà-Gomis Miquel Specia Lucia
author_sort	Forcada Mikel L.
title	Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_short	Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_full	Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_fullStr	Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_full_unstemmed	Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?
title_sort	towards optimizing mt for post-editing effort: can bleu still be useful?
publisher	Sciendo
series	Prague Bulletin of Mathematical Linguistics
issn	1804-0462
publishDate	2017-06-01
description	We propose a simple, linear-combination automatic evaluation measure (AEM) to approximate post-editing (PE) effort. Effort is measured both as PE time and as the number of PE operations performed. The ultimate goal is to define an AEM that can be used to optimize machine translation (MT) systems to minimize PE effort, but without having to perform unfeasible repeated PE during optimization. As PE effort is expected to be an extensive magnitude (i.e., one growing linearly with the sentence length and which may be simply added to represent the effort for a set of sentences), we use a linear combination of extensive and pseudo-extensive features. One such pseudo-extensive feature, 1–BLEU times the length of the reference, proves to be almost as good a predictor of PE effort as the best combination of extensive features. Surprisingly, effort predictors computed using independently obtained reference translations perform reasonably close to those using actual post-edited references. In the early stage of this research and given the inherent complexity of carrying out experiments with professional post-editors, we decided to carry out an automatic evaluation of the AEMs proposed rather than a manual evaluation to measure the effort needed to post-edit the output of an MT system tuned on these AEMs. The results obtained seem to support current tuning practice using BLEU, yet pointing at some limitations. Apart from this intrinsic evaluation, an extrinsic evaluation was also carried out in which the AEMs proposed were used to build synthetic training corpora for MT quality estimation, with results comparable to those obtained when training with measured PE efforts.
url	https://doi.org/10.1515/pralin-2017-0019
work_keys_str_mv	AT forcadamikell towardsoptimizingmtforposteditingeffortcanbleustillbeuseful AT sanchezmartinezfelipe towardsoptimizingmtforposteditingeffortcanbleustillbeuseful AT esplagomismiquel towardsoptimizingmtforposteditingeffortcanbleustillbeuseful AT specialucia towardsoptimizingmtforposteditingeffortcanbleustillbeuseful
_version_	1717812808714813440

Towards Optimizing MT for Post-Editing Effort: Can BLEU Still Be Useful?

Similar Items