Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models ar...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9257421/ |
id |
doaj-235779ca011d4329b08486a59975cbee |
---|---|
record_format |
Article |
spelling |
doaj-235779ca011d4329b08486a59975cbee2021-03-30T04:18:11ZengIEEEIEEE Access2169-35362020-01-01820663820664510.1109/ACCESS.2020.30378219257421Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge DistillationXinlu Zhang0https://orcid.org/0000-0003-3553-5956Xiao Li1Yating Yang2https://orcid.org/0000-0002-2639-3944Rui Dong3https://orcid.org/0000-0002-4110-3976Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaKnowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively.https://ieeexplore.ieee.org/document/9257421/Neural machine translationknowledge distillationprior knowledge |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xinlu Zhang Xiao Li Yating Yang Rui Dong |
spellingShingle |
Xinlu Zhang Xiao Li Yating Yang Rui Dong Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation IEEE Access Neural machine translation knowledge distillation prior knowledge |
author_facet |
Xinlu Zhang Xiao Li Yating Yang Rui Dong |
author_sort |
Xinlu Zhang |
title |
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation |
title_short |
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation |
title_full |
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation |
title_fullStr |
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation |
title_full_unstemmed |
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation |
title_sort |
improving low-resource neural machine translation with teacher-free knowledge distillation |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively. |
topic |
Neural machine translation knowledge distillation prior knowledge |
url |
https://ieeexplore.ieee.org/document/9257421/ |
work_keys_str_mv |
AT xinluzhang improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation AT xiaoli improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation AT yatingyang improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation AT ruidong improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation |
_version_ |
1724181952791052288 |