Investigation of text data augmentation for transformer training via translation technique
Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an i...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Vilnius University Press
2021-05-01
|
Series: | Vilnius University Open Series |
Subjects: | |
Online Access: | https://www.zurnalai.vu.lt/open-series/article/view/24036 |
id |
doaj-beb07bbf32334dc7ac4fa503dbd8adc6 |
---|---|
record_format |
Article |
spelling |
doaj-beb07bbf32334dc7ac4fa503dbd8adc62021-05-14T09:25:43ZengVilnius University PressVilnius University Open Series2669-05352021-05-0110.15388/LMITT.2021.11Investigation of text data augmentation for transformer training via translation techniqueDominykas Šeputis0Vilnius University, Lithuania Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an impact to the trans- former network-based models’ performance. The experimental results showed that multilingual models like DistilBERT in some cases benefit from the introduction of the addition artificially created data samples presented in a foreign language. https://www.zurnalai.vu.lt/open-series/article/view/24036none |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dominykas Šeputis |
spellingShingle |
Dominykas Šeputis Investigation of text data augmentation for transformer training via translation technique Vilnius University Open Series none |
author_facet |
Dominykas Šeputis |
author_sort |
Dominykas Šeputis |
title |
Investigation of text data augmentation for transformer training via translation technique |
title_short |
Investigation of text data augmentation for transformer training via translation technique |
title_full |
Investigation of text data augmentation for transformer training via translation technique |
title_fullStr |
Investigation of text data augmentation for transformer training via translation technique |
title_full_unstemmed |
Investigation of text data augmentation for transformer training via translation technique |
title_sort |
investigation of text data augmentation for transformer training via translation technique |
publisher |
Vilnius University Press |
series |
Vilnius University Open Series |
issn |
2669-0535 |
publishDate |
2021-05-01 |
description |
Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an impact to the trans- former network-based models’ performance. The experimental results showed that multilingual models like DistilBERT in some cases benefit from the introduction of the addition artificially created data samples presented in a foreign language.
|
topic |
none |
url |
https://www.zurnalai.vu.lt/open-series/article/view/24036 |
work_keys_str_mv |
AT dominykasseputis investigationoftextdataaugmentationfortransformertrainingviatranslationtechnique |
_version_ |
1721441216396001280 |