Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation

Neural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation...

Full description

Bibliographic Details
Main Authors: Peter Jan-Thorsten, Nix Arne, Ney Hermann
Format: Article
Language:English
Published: Sciendo 2017-06-01
Series:Prague Bulletin of Mathematical Linguistics
Online Access:https://doi.org/10.1515/pralin-2017-0006
id doaj-25be8576d5b24ff5a21718782f50d44d
record_format Article
spelling doaj-25be8576d5b24ff5a21718782f50d44d2021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622017-06-011081273610.1515/pralin-2017-0006pralin-2017-0006Generating Alignments Using Target Foresight in Attention-Based Neural Machine TranslationPeter Jan-Thorsten0Nix Arne1Ney Hermann2Human Language Technology and Pattern Recognition Group, RWTH Aachen University, Ahornstr. 55, 52056 Aachen, GermanyHuman Language Technology and Pattern Recognition Group, RWTH Aachen University, Ahornstr. 55, 52056 Aachen, GermanyHuman Language Technology and Pattern Recognition Group, RWTH Aachen University, Ahornstr. 55, 52056 Aachen, GermanyNeural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation is the most likely reason that the quality of attention-based alignments is inferior to the quality of traditional alignment methods. Guided alignment training has shown that alignments are still capable of improving translation quality. In this work, we propose an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments. In comparison to the conventional attention-based alignments, our model halves the Aer with an absolute improvement of 19.1% Aer. Compared to GIZA++ it shows an absolute improvement of 2.0% Aer.https://doi.org/10.1515/pralin-2017-0006
collection DOAJ
language English
format Article
sources DOAJ
author Peter Jan-Thorsten
Nix Arne
Ney Hermann
spellingShingle Peter Jan-Thorsten
Nix Arne
Ney Hermann
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
Prague Bulletin of Mathematical Linguistics
author_facet Peter Jan-Thorsten
Nix Arne
Ney Hermann
author_sort Peter Jan-Thorsten
title Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
title_short Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
title_full Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
title_fullStr Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
title_full_unstemmed Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
title_sort generating alignments using target foresight in attention-based neural machine translation
publisher Sciendo
series Prague Bulletin of Mathematical Linguistics
issn 1804-0462
publishDate 2017-06-01
description Neural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation is the most likely reason that the quality of attention-based alignments is inferior to the quality of traditional alignment methods. Guided alignment training has shown that alignments are still capable of improving translation quality. In this work, we propose an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments. In comparison to the conventional attention-based alignments, our model halves the Aer with an absolute improvement of 19.1% Aer. Compared to GIZA++ it shows an absolute improvement of 2.0% Aer.
url https://doi.org/10.1515/pralin-2017-0006
work_keys_str_mv AT peterjanthorsten generatingalignmentsusingtargetforesightinattentionbasedneuralmachinetranslation
AT nixarne generatingalignmentsusingtargetforesightinattentionbasedneuralmachinetranslation
AT neyhermann generatingalignmentsusingtargetforesightinattentionbasedneuralmachinetranslation
_version_ 1717812799157043200