Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR

The most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words und...

Full description

Bibliographic Details
Main Authors: Jeongin Kim, Taekeun Hong, Pankoo Kim
Format: Article
Language:English
Published: Hindawi Limited 2021-01-01
Series:Mobile Information Systems
Online Access:http://dx.doi.org/10.1155/2021/5548426
id doaj-38585d00e35f4b759b08b51cb2cdaecc
record_format Article
spelling doaj-38585d00e35f4b759b08b51cb2cdaecc2021-07-26T00:33:58ZengHindawi LimitedMobile Information Systems1875-905X2021-01-01202110.1155/2021/5548426Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCRJeongin Kim0Taekeun Hong1Pankoo Kim2Department of Computer EngineeringDepartment of Computer EngineeringDepartment of Computer EngineeringThe most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words under the same context based on the conceptual system learned. In this study, a word-to-vector and conceptual relationship (Word2VnCR) algorithm is proposed that replaces an OOV word leading to an erroneous morphemic analysis with an appropriate synonym. TheWord2VnCR algorithm is an improvement over the conventional Word2Vec algorithm, which has a problem in suggesting a replacement word by not determining the similarity of the word. After word-embedding learning is conducted using the learning dataset, the replacement word candidates of the OOV word are extracted. The semantic similarities of the extracted replacement word candidates are measured with the surrounding neighboring words of the OOV word, and a replacement word having the highest similarity value is selected as a replacement. To evaluate the performance of the proposed Word2VnCR algorithm, a comparative experiment was conducted using the Word2VnCR and Word2Vec algorithms. As the experimental results indicate, the proposed algorithm shows a higher accuracy than the Word2Vec algorithm.http://dx.doi.org/10.1155/2021/5548426
collection DOAJ
language English
format Article
sources DOAJ
author Jeongin Kim
Taekeun Hong
Pankoo Kim
spellingShingle Jeongin Kim
Taekeun Hong
Pankoo Kim
Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
Mobile Information Systems
author_facet Jeongin Kim
Taekeun Hong
Pankoo Kim
author_sort Jeongin Kim
title Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
title_short Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
title_full Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
title_fullStr Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
title_full_unstemmed Replacing Out-of-Vocabulary Words with an Appropriate Synonym Based on Word2VnCR
title_sort replacing out-of-vocabulary words with an appropriate synonym based on word2vncr
publisher Hindawi Limited
series Mobile Information Systems
issn 1875-905X
publishDate 2021-01-01
description The most typical problem in an analysis of natural language is finding synonyms of out-of-vocabulary (OOV) words. When someone tries to understand a sentence containing an OOV word, the person determines the most appropriate meaning of a replacement word using the meanings of co-occurrence words under the same context based on the conceptual system learned. In this study, a word-to-vector and conceptual relationship (Word2VnCR) algorithm is proposed that replaces an OOV word leading to an erroneous morphemic analysis with an appropriate synonym. TheWord2VnCR algorithm is an improvement over the conventional Word2Vec algorithm, which has a problem in suggesting a replacement word by not determining the similarity of the word. After word-embedding learning is conducted using the learning dataset, the replacement word candidates of the OOV word are extracted. The semantic similarities of the extracted replacement word candidates are measured with the surrounding neighboring words of the OOV word, and a replacement word having the highest similarity value is selected as a replacement. To evaluate the performance of the proposed Word2VnCR algorithm, a comparative experiment was conducted using the Word2VnCR and Word2Vec algorithms. As the experimental results indicate, the proposed algorithm shows a higher accuracy than the Word2Vec algorithm.
url http://dx.doi.org/10.1155/2021/5548426
work_keys_str_mv AT jeonginkim replacingoutofvocabularywordswithanappropriatesynonymbasedonword2vncr
AT taekeunhong replacingoutofvocabularywordswithanappropriatesynonymbasedonword2vncr
AT pankookim replacingoutofvocabularywordswithanappropriatesynonymbasedonword2vncr
_version_ 1721282514450907136