Investigation of text data augmentation for transformer training via translation technique

Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an i...

Full description

Bibliographic Details
Main Author: Dominykas Šeputis
Format: Article
Language:English
Published: Vilnius University Press 2021-05-01
Series:Vilnius University Open Series
Subjects:
Online Access:https://www.zurnalai.vu.lt/open-series/article/view/24036
id doaj-beb07bbf32334dc7ac4fa503dbd8adc6
record_format Article
spelling doaj-beb07bbf32334dc7ac4fa503dbd8adc62021-05-14T09:25:43ZengVilnius University PressVilnius University Open Series2669-05352021-05-0110.15388/LMITT.2021.11Investigation of text data augmentation for transformer training via translation techniqueDominykas Šeputis0Vilnius University, Lithuania Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an impact to the trans- former network-based models’ performance. The experimental results showed that multilingual models like DistilBERT in some cases benefit from the introduction of the addition artificially created data samples presented in a foreign language. https://www.zurnalai.vu.lt/open-series/article/view/24036none
collection DOAJ
language English
format Article
sources DOAJ
author Dominykas Šeputis
spellingShingle Dominykas Šeputis
Investigation of text data augmentation for transformer training via translation technique
Vilnius University Open Series
none
author_facet Dominykas Šeputis
author_sort Dominykas Šeputis
title Investigation of text data augmentation for transformer training via translation technique
title_short Investigation of text data augmentation for transformer training via translation technique
title_full Investigation of text data augmentation for transformer training via translation technique
title_fullStr Investigation of text data augmentation for transformer training via translation technique
title_full_unstemmed Investigation of text data augmentation for transformer training via translation technique
title_sort investigation of text data augmentation for transformer training via translation technique
publisher Vilnius University Press
series Vilnius University Open Series
issn 2669-0535
publishDate 2021-05-01
description Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an impact to the trans- former network-based models’ performance. The experimental results showed that multilingual models like DistilBERT in some cases benefit from the introduction of the addition artificially created data samples presented in a foreign language.
topic none
url https://www.zurnalai.vu.lt/open-series/article/view/24036
work_keys_str_mv AT dominykasseputis investigationoftextdataaugmentationfortransformertrainingviatranslationtechnique
_version_ 1721441216396001280