Training Tips for the Transformer Model

This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability...

Full description

Bibliographic Details
Main Authors: Popel Martin, Bojar Ondřej
Format: Article
Language:English
Published: Sciendo 2018-04-01
Series:Prague Bulletin of Mathematical Linguistics
Online Access:https://doi.org/10.2478/pralin-2018-0002