Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
The most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words und...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2021-01-01
|
Series: | Mobile Information Systems |
Online Access: | http://dx.doi.org/10.1155/2021/5548426 |
id |
doaj-38585d00e35f4b759b08b51cb2cdaecc |
---|---|
record_format |
Article |
spelling |
doaj-38585d00e35f4b759b08b51cb2cdaecc2021-07-26T00:33:58ZengHindawi LimitedMobile Information Systems1875-905X2021-01-01202110.1155/2021/5548426Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCRJeongin Kim0Taekeun Hong1Pankoo Kim2Department of Computer EngineeringDepartment of Computer EngineeringDepartment of Computer EngineeringThe most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words under the same context based on the conceptual system learned. In this study, a word-to-vector and conceptual relationship (Word2VnCR) algorithm is proposed that replaces an OOV word leading to an erroneous morphemic analysis with an appropriate synonym. TheWord2VnCR algorithm is an improvement over the conventional Word2Vec algorithm, which has a problem in suggesting a replacement word by not determining the similarity of the word. After word-embedding learning is conducted using the learning dataset, the replacement word candidates of the OOV word are extracted. The semantic similarities of the extracted replacement word candidates are measured with the surrounding neighboring words of the OOV word, and a replacement word having the highest similarity value is selected as a replacement. To evaluate the performance of the proposed Word2VnCR algorithm, a comparative experiment was conducted using the Word2VnCR and Word2Vec algorithms. As the experimental results indicate, the proposed algorithm shows a higher accuracy than the Word2Vec algorithm.http://dx.doi.org/10.1155/2021/5548426 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jeongin Kim Taekeun Hong Pankoo Kim |
spellingShingle |
Jeongin Kim Taekeun Hong Pankoo Kim Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR Mobile Information Systems |
author_facet |
Jeongin Kim Taekeun Hong Pankoo Kim |
author_sort |
Jeongin Kim |
title |
Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR |
title_short |
Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR |
title_full |
Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR |
title_fullStr |
Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR |
title_full_unstemmed |
Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR |
title_sort |
replacing out-of-vocabulary words with an appropriate synonym based on word2vncr |
publisher |
Hindawi Limited |
series |
Mobile Information Systems |
issn |
1875-905X |
publishDate |
2021-01-01 |
description |
The most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words under the same context based on the conceptual system learned. In this study, a word-to-vector and conceptual relationship (Word2VnCR) algorithm is proposed that replaces an OOV word leading to an erroneous morphemic analysis with an appropriate synonym. TheWord2VnCR algorithm is an improvement over the conventional Word2Vec algorithm, which has a problem in suggesting a replacement word by not determining the similarity of the word. After word-embedding learning is conducted using the learning dataset, the replacement word candidates of the OOV word are extracted. The semantic similarities of the extracted replacement word candidates are measured with the surrounding neighboring words of the OOV word, and a replacement word having the highest similarity value is selected as a replacement. To evaluate the performance of the proposed Word2VnCR algorithm, a comparative experiment was conducted using the Word2VnCR and Word2Vec algorithms. As the experimental results indicate, the proposed algorithm shows a higher accuracy than the Word2Vec algorithm. |
url |
http://dx.doi.org/10.1155/2021/5548426 |
work_keys_str_mv |
AT jeonginkim replacingoutofvocabularywordswithanappropriatesynonymbasedonword2vncr AT taekeunhong replacingoutofvocabularywordswithanappropriatesynonymbasedonword2vncr AT pankookim replacingoutofvocabularywordswithanappropriatesynonymbasedonword2vncr |
_version_ |
1721282514450907136 |