The solution of the problem of unknown words under neural machine translation of the Kazakh language
The paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh langu...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2021-04-01
|
Series: | Journal of Information and Telecommunication |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/24751839.2020.1838713 |
id |
doaj-16a16a5ca2704fefbc36a2c6ed76b42c |
---|---|
record_format |
Article |
spelling |
doaj-16a16a5ca2704fefbc36a2c6ed76b42c2021-06-02T10:12:15ZengTaylor & Francis GroupJournal of Information and Telecommunication2475-18392475-18472021-04-015221422510.1080/24751839.2020.18387131838713The solution of the problem of unknown words under neural machine translation of the Kazakh languageAliya Turganbayeva0Ualsher Tukeyev1Al-Farabi Kazakh National UniversityAl-Farabi Kazakh National UniversityThe paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh language is an algorithm proposed for searching for unknown words in the dictionary of a trained model of NMT and using the dictionary of synonyms of the Kazakh to replace an unknown word with a word that is close in meaning. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. Software solutions to the unknown word problem have been developed in the Python. The proposed technology solution to the problem of unknown words was tested on the two parallel Kazakh-English corpus in both variants: baseline NMT and NMT with using of the proposed technology.http://dx.doi.org/10.1080/24751839.2020.1838713neural machine translationunknown wordskazakh language |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Aliya Turganbayeva Ualsher Tukeyev |
spellingShingle |
Aliya Turganbayeva Ualsher Tukeyev The solution of the problem of unknown words under neural machine translation of the Kazakh language Journal of Information and Telecommunication neural machine translation unknown words kazakh language |
author_facet |
Aliya Turganbayeva Ualsher Tukeyev |
author_sort |
Aliya Turganbayeva |
title |
The solution of the problem of unknown words under neural machine translation of the Kazakh language |
title_short |
The solution of the problem of unknown words under neural machine translation of the Kazakh language |
title_full |
The solution of the problem of unknown words under neural machine translation of the Kazakh language |
title_fullStr |
The solution of the problem of unknown words under neural machine translation of the Kazakh language |
title_full_unstemmed |
The solution of the problem of unknown words under neural machine translation of the Kazakh language |
title_sort |
solution of the problem of unknown words under neural machine translation of the kazakh language |
publisher |
Taylor & Francis Group |
series |
Journal of Information and Telecommunication |
issn |
2475-1839 2475-1847 |
publishDate |
2021-04-01 |
description |
The paper proposes a solution to the problem of unknown words for neural machine translation (NMT). The proposed solution is shown by the example of NMT of the Kazakh-English language pair. The novelty of the proposed technology for solving the problem of unknown words in the NMT of the Kazakh language is an algorithm proposed for searching for unknown words in the dictionary of a trained model of NMT and using the dictionary of synonyms of the Kazakh to replace an unknown word with a word that is close in meaning. A dictionary of synonyms is used to search for words that are similar in meaning to the unknown words, which was defined. Moreover, the found synonyms are checked for the presence in the vocabulary of a trained model. After that, a new translation of the edited sentence of the source language is performed. The base of words-synonyms of the Kazakh language is collected. Software solutions to the unknown word problem have been developed in the Python. The proposed technology solution to the problem of unknown words was tested on the two parallel Kazakh-English corpus in both variants: baseline NMT and NMT with using of the proposed technology. |
topic |
neural machine translation unknown words kazakh language |
url |
http://dx.doi.org/10.1080/24751839.2020.1838713 |
work_keys_str_mv |
AT aliyaturganbayeva thesolutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage AT ualshertukeyev thesolutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage AT aliyaturganbayeva solutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage AT ualshertukeyev solutionoftheproblemofunknownwordsunderneuralmachinetranslationofthekazakhlanguage |
_version_ |
1721405156259528704 |