Methods for Detoxification of Texts for the Russian Language
We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Multimodal Technologies and Interaction |
Subjects: | |
Online Access: | https://www.mdpi.com/2414-4088/5/9/54 |
id |
doaj-9c4b392a802b4914818b9d98885c980f |
---|---|
record_format |
Article |
spelling |
doaj-9c4b392a802b4914818b9d98885c980f2021-09-26T00:47:42ZengMDPI AGMultimodal Technologies and Interaction2414-40882021-09-015545410.3390/mti5090054Methods for Detoxification of Texts for the Russian LanguageDaryna Dementieva0Daniil Moskovskiy1Varvara Logacheva2David Dale3Olga Kozlova4Nikita Semenov5Alexander Panchenko6Skolkovo Institute of Science and Technology, 121205 Moscow, RussiaSkolkovo Institute of Science and Technology, 121205 Moscow, RussiaSkolkovo Institute of Science and Technology, 121205 Moscow, RussiaSkolkovo Institute of Science and Technology, 121205 Moscow, RussiaMobile TeleSystems (MTS), 109147 Moscow, RussiaMobile TeleSystems (MTS), 109147 Moscow, RussiaSkolkovo Institute of Science and Technology, 121205 Moscow, RussiaWe introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models—an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.https://www.mdpi.com/2414-4088/5/9/54text style transfertoxicity detectiondetoxificationpretrained models |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Daryna Dementieva Daniil Moskovskiy Varvara Logacheva David Dale Olga Kozlova Nikita Semenov Alexander Panchenko |
spellingShingle |
Daryna Dementieva Daniil Moskovskiy Varvara Logacheva David Dale Olga Kozlova Nikita Semenov Alexander Panchenko Methods for Detoxification of Texts for the Russian Language Multimodal Technologies and Interaction text style transfer toxicity detection detoxification pretrained models |
author_facet |
Daryna Dementieva Daniil Moskovskiy Varvara Logacheva David Dale Olga Kozlova Nikita Semenov Alexander Panchenko |
author_sort |
Daryna Dementieva |
title |
Methods for Detoxification of Texts for the Russian Language |
title_short |
Methods for Detoxification of Texts for the Russian Language |
title_full |
Methods for Detoxification of Texts for the Russian Language |
title_fullStr |
Methods for Detoxification of Texts for the Russian Language |
title_full_unstemmed |
Methods for Detoxification of Texts for the Russian Language |
title_sort |
methods for detoxification of texts for the russian language |
publisher |
MDPI AG |
series |
Multimodal Technologies and Interaction |
issn |
2414-4088 |
publishDate |
2021-09-01 |
description |
We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models—an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement. |
topic |
text style transfer toxicity detection detoxification pretrained models |
url |
https://www.mdpi.com/2414-4088/5/9/54 |
work_keys_str_mv |
AT darynadementieva methodsfordetoxificationoftextsfortherussianlanguage AT daniilmoskovskiy methodsfordetoxificationoftextsfortherussianlanguage AT varvaralogacheva methodsfordetoxificationoftextsfortherussianlanguage AT daviddale methodsfordetoxificationoftextsfortherussianlanguage AT olgakozlova methodsfordetoxificationoftextsfortherussianlanguage AT nikitasemenov methodsfordetoxificationoftextsfortherussianlanguage AT alexanderpanchenko methodsfordetoxificationoftextsfortherussianlanguage |
_version_ |
1716869798346883072 |