Training Tips for the Transformer Model
This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2018-04-01
|
Series: | Prague Bulletin of Mathematical Linguistics |
Online Access: | https://doi.org/10.2478/pralin-2018-0002 |