Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models ar...
Main Authors: | Xinlu Zhang, Xiao Li, Yating Yang, Rui Dong |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9257421/ |
Similar Items
-
Grammatically Derived Factual Relation Augmented Neural Machine Translation
by: Li, F., et al.
Published: (2022) -
Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems
by: Yoshitomo Matsubara, et al.
Published: (2020-01-01) -
Variational Bayesian Group-Level Sparsification for Knowledge Distillation
by: Yue Ming, et al.
Published: (2020-01-01) -
Review of Knowledge Distillation in Convolutional Neural Network Compression
by: MENG Xianfa, LIU Fang, LI Guang, HUANG Mengmeng
Published: (2021-10-01) -
Layer-Level Knowledge Distillation for Deep Neural Network Learning
by: Hao-Ting Li, et al.
Published: (2019-05-01)