QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH

In this study, we propose an alternative approach to analyzing a domain-specific time series corpus for detecting word evolution. The method trains a target corpus in time series into a temporal word embedding (TWE) model. The advantage of TWE is that one can see how the meaning of a word changes ov...

Full description

Bibliographic Details
Main Authors: Sabrina Tiun, Saidah Saad, Nor Fariza Mohd Noor, Azhar Jalaludin, Anis Nadiah Che Abdul Rahman
Format: Article
Language:English
Published: UKM Press 2020-12-01
Series:Asia-Pacific Journal of Information Technology and Multimedia
Subjects:
Online Access:https://www.ukm.my/apjitm/view.php?id=197
id doaj-95661df5a7b04ae18e9dc169237fbe72
record_format Article
spelling doaj-95661df5a7b04ae18e9dc169237fbe722021-06-30T06:25:12ZengUKM PressAsia-Pacific Journal of Information Technology and Multimedia2289-21922020-12-010902110https://doi.org/10.17576/apjitm-2020-0902-01QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACHSabrina TiunSaidah SaadNor Fariza Mohd NoorAzhar JalaludinAnis Nadiah Che Abdul RahmanIn this study, we propose an alternative approach to analyzing a domain-specific time series corpus for detecting word evolution. The method trains a target corpus in time series into a temporal word embedding (TWE) model. The advantage of TWE is that one can see how the meaning of a word changes over time. We have chosen the TWEC approach to model a Malay domain-specific time-series corpus, the Malaysian Hansard Corpus (MHC), to a TWE model and called the model as MHC-TWEC. Two primary analyses, i.e., self-similarity analysis and user-defined method analysis, were performed to validate the effectiveness of the MHC-TWEC model in quantifying semantic shift on MHC visually. From those analyses, we visually find out that the TWE model can capture the semantic shift in the temporal corpus (the MHC).https://www.ukm.my/apjitm/view.php?id=197temporal word embeddingtemporal corpusmalaysian hansard corpus
collection DOAJ
language English
format Article
sources DOAJ
author Sabrina Tiun
Saidah Saad
Nor Fariza Mohd Noor
Azhar Jalaludin
Anis Nadiah Che Abdul Rahman
spellingShingle Sabrina Tiun
Saidah Saad
Nor Fariza Mohd Noor
Azhar Jalaludin
Anis Nadiah Che Abdul Rahman
QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH
Asia-Pacific Journal of Information Technology and Multimedia
temporal word embedding
temporal corpus
malaysian hansard corpus
author_facet Sabrina Tiun
Saidah Saad
Nor Fariza Mohd Noor
Azhar Jalaludin
Anis Nadiah Che Abdul Rahman
author_sort Sabrina Tiun
title QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH
title_short QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH
title_full QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH
title_fullStr QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH
title_full_unstemmed QUANTIFYING SEMANTIC SHIFT VISUALLY ON A MALAY DOMAIN SPECIFIC CORPUS USING TEMPORAL WORD EMBEDDING APPROACH
title_sort quantifying semantic shift visually on a malay domain specific corpus using temporal word embedding approach
publisher UKM Press
series Asia-Pacific Journal of Information Technology and Multimedia
issn 2289-2192
publishDate 2020-12-01
description In this study, we propose an alternative approach to analyzing a domain-specific time series corpus for detecting word evolution. The method trains a target corpus in time series into a temporal word embedding (TWE) model. The advantage of TWE is that one can see how the meaning of a word changes over time. We have chosen the TWEC approach to model a Malay domain-specific time-series corpus, the Malaysian Hansard Corpus (MHC), to a TWE model and called the model as MHC-TWEC. Two primary analyses, i.e., self-similarity analysis and user-defined method analysis, were performed to validate the effectiveness of the MHC-TWEC model in quantifying semantic shift on MHC visually. From those analyses, we visually find out that the TWE model can capture the semantic shift in the temporal corpus (the MHC).
topic temporal word embedding
temporal corpus
malaysian hansard corpus
url https://www.ukm.my/apjitm/view.php?id=197
work_keys_str_mv AT sabrinatiun quantifyingsemanticshiftvisuallyonamalaydomainspecificcorpususingtemporalwordembeddingapproach
AT saidahsaad quantifyingsemanticshiftvisuallyonamalaydomainspecificcorpususingtemporalwordembeddingapproach
AT norfarizamohdnoor quantifyingsemanticshiftvisuallyonamalaydomainspecificcorpususingtemporalwordembeddingapproach
AT azharjalaludin quantifyingsemanticshiftvisuallyonamalaydomainspecificcorpususingtemporalwordembeddingapproach
AT anisnadiahcheabdulrahman quantifyingsemanticshiftvisuallyonamalaydomainspecificcorpususingtemporalwordembeddingapproach
_version_ 1721353253273206784