Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models ar...

Full description

Bibliographic Details
Main Authors: Xinlu Zhang, Xiao Li, Yating Yang, Rui Dong
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9257421/