Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
Neural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2017-06-01
|
Series: | Prague Bulletin of Mathematical Linguistics |
Online Access: | https://doi.org/10.1515/pralin-2017-0006 |
id |
doaj-25be8576d5b24ff5a21718782f50d44d |
---|---|
record_format |
Article |
spelling |
doaj-25be8576d5b24ff5a21718782f50d44d2021-09-05T13:59:53ZengSciendoPrague Bulletin of Mathematical Linguistics 1804-04622017-06-011081273610.1515/pralin-2017-0006pralin-2017-0006Generating Alignments Using Target Foresight in Attention-Based Neural Machine TranslationPeter Jan-Thorsten0Nix Arne1Ney Hermann2Human Language Technology and Pattern Recognition Group, RWTH Aachen University, Ahornstr. 55, 52056 Aachen, GermanyHuman Language Technology and Pattern Recognition Group, RWTH Aachen University, Ahornstr. 55, 52056 Aachen, GermanyHuman Language Technology and Pattern Recognition Group, RWTH Aachen University, Ahornstr. 55, 52056 Aachen, GermanyNeural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation is the most likely reason that the quality of attention-based alignments is inferior to the quality of traditional alignment methods. Guided alignment training has shown that alignments are still capable of improving translation quality. In this work, we propose an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments. In comparison to the conventional attention-based alignments, our model halves the Aer with an absolute improvement of 19.1% Aer. Compared to GIZA++ it shows an absolute improvement of 2.0% Aer.https://doi.org/10.1515/pralin-2017-0006 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Peter Jan-Thorsten Nix Arne Ney Hermann |
spellingShingle |
Peter Jan-Thorsten Nix Arne Ney Hermann Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation Prague Bulletin of Mathematical Linguistics |
author_facet |
Peter Jan-Thorsten Nix Arne Ney Hermann |
author_sort |
Peter Jan-Thorsten |
title |
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation |
title_short |
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation |
title_full |
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation |
title_fullStr |
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation |
title_full_unstemmed |
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation |
title_sort |
generating alignments using target foresight in attention-based neural machine translation |
publisher |
Sciendo |
series |
Prague Bulletin of Mathematical Linguistics |
issn |
1804-0462 |
publishDate |
2017-06-01 |
description |
Neural machine translation (NMT) has shown large improvements in recent years. The currently most successful approach in this area relies on the attention mechanism, which is often interpreted as an alignment, even though it is computed without explicit knowledge of the target word. This limitation is the most likely reason that the quality of attention-based alignments is inferior to the quality of traditional alignment methods. Guided alignment training has shown that alignments are still capable of improving translation quality. In this work, we propose an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments. In comparison to the conventional attention-based alignments, our model halves the Aer with an absolute improvement of 19.1% Aer. Compared to GIZA++ it shows an absolute improvement of 2.0% Aer. |
url |
https://doi.org/10.1515/pralin-2017-0006 |
work_keys_str_mv |
AT peterjanthorsten generatingalignmentsusingtargetforesightinattentionbasedneuralmachinetranslation AT nixarne generatingalignmentsusingtargetforesightinattentionbasedneuralmachinetranslation AT neyhermann generatingalignmentsusingtargetforesightinattentionbasedneuralmachinetranslation |
_version_ |
1717812799157043200 |