The solution of the problem of unknown words under neural machine translation of the Kazakh language

The paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh langu...

Full description

Bibliographic Details
Main Authors: Aliya Turganbayeva, Ualsher Tukeyev
Format: Article
Language:English
Published: Taylor & Francis Group 2021-04-01
Series:Journal of Information and Telecommunication
Subjects:
Online Access:http://dx.doi.org/10.1080/24751839.2020.1838713
id doaj-16a16a5ca2704fefbc36a2c6ed76b42c
record_format Article
spelling doaj-16a16a5ca2704fefbc36a2c6ed76b42c2021-06-02T10:12:15ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472021-04-015221422510.1080/24751839.2020.18387131838713The solution of the problem of unknown words under neural machine translation of the Kazakh languageAliya Turganbayeva0Ualsher Tukeyev1Al-Farabi Kazakh National UniversityAl-Farabi Kazakh National UniversityThe paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh language is an algorithm proposed for searching for unknown words in the dictionary of a trained model of NMT and using the dictionary of synonyms of the Kazakh to replace an unknown word with a word that is close in meaning. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. Software solutions to the unknown word problem have been developed in the Python. The proposed technology solution to the problem of unknown words was tested on the two parallel Kazakh-English corpus in both variants: baseline NMT and NMT with using of the proposed technology.http://dx.doi.org/10.1080/24751839.2020.1838713neural machine translationunknown wordskazakh language
collection DOAJ
language English
format Article
sources DOAJ
author Aliya Turganbayeva
Ualsher Tukeyev
spellingShingle Aliya Turganbayeva
Ualsher Tukeyev
The solution of the problem of unknown words under neural machine translation of the Kazakh language
Journal of Information and Telecommunication
neural machine translation
unknown words
kazakh language
author_facet Aliya Turganbayeva
Ualsher Tukeyev
author_sort Aliya Turganbayeva
title The solution of the problem of unknown words under neural machine translation of the Kazakh language
title_short The solution of the problem of unknown words under neural machine translation of the Kazakh language
title_full The solution of the problem of unknown words under neural machine translation of the Kazakh language
title_fullStr The solution of the problem of unknown words under neural machine translation of the Kazakh language
title_full_unstemmed The solution of the problem of unknown words under neural machine translation of the Kazakh language
title_sort solution of the problem of unknown words under neural machine translation of the kazakh language
publisher Taylor & Francis Group
series Journal of Information and Telecommunication
issn 2475-1839
2475-1847
publishDate 2021-04-01
description The paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh language is an algorithm proposed for searching for unknown words in the dictionary of a trained model of NMT and using the dictionary of synonyms of the Kazakh to replace an unknown word with a word that is close in meaning. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. Software solutions to the unknown word problem have been developed in the Python. The proposed technology solution to the problem of unknown words was tested on the two parallel Kazakh-English corpus in both variants: baseline NMT and NMT with using of the proposed technology.
topic neural machine translation
unknown words
kazakh language
url http://dx.doi.org/10.1080/24751839.2020.1838713
work_keys_str_mv AT aliyaturganbayeva thesolutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage
AT ualshertukeyev thesolutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage
AT aliyaturganbayeva solutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage
AT ualshertukeyev solutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage
_version_ 1721405156259528704