Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension

Currently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new...

Full description

Bibliographic Details
Main Authors: Narongsak Chayangkoon, Anongnart Srivihok
Format: Article
Language:English
Published: Khon Kaen University 2021-07-01
Series:Engineering and Applied Science Research
Subjects:
Online Access:https://ph01.tci-thaijo.org/index.php/easr/article/download/243616/166483/
id doaj-ebde780b1f5442c282801bbb7081e45f
record_format Article
spelling doaj-ebde780b1f5442c282801bbb7081e45f2021-07-12T04:08:33ZengKhon Kaen UniversityEngineering and Applied Science Research2539-61612539-62182021-07-01485547559Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extensionNarongsak ChayangkoonAnongnart SrivihokCurrently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new data preprocessing methods, Sentence Vector of Word Embedding (SVWE) and Sentence Vector of Word Embedding with Feature Extension (SWEF), are introduced to prepare data for the NTCM. The proposed data preprocessing method uses the reduction of the dataset to produce an SVWE. Word embedding is generated by deep neural networks using the skip-gram model. The authors further extended some features to SVWE to produce a new dataset called SWEF; these datasets were used for the dataset in the NTCM. The authors collected data with keywords related to narcotic drugs from Twitter in Asia. The authors investigated a text classification model using a Support Vector Machine, Logistic Regression, a Decision Tree, and a Convolutional Neural Network. Logistic Regression with the SWEF provided the best approach for the NTCM compared with state-of-the-art methods. The proposed NTCM showed correctness and fitness by accuracy (0.8964), F-Measure (0.895), AUC (0.949), Kappa (0.7131), MCC (0.714), and low running time performance (1.04 seconds).https://ph01.tci-thaijo.org/index.php/easr/article/download/243616/166483/data miningdata preprocessingfeature reductionnarcotic drugtext classificationtext vectorizationword embedding
collection DOAJ
language English
format Article
sources DOAJ
author Narongsak Chayangkoon
Anongnart Srivihok
spellingShingle Narongsak Chayangkoon
Anongnart Srivihok
Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
Engineering and Applied Science Research
data mining
data preprocessing
feature reduction
narcotic drug
text classification
text vectorization
word embedding
author_facet Narongsak Chayangkoon
Anongnart Srivihok
author_sort Narongsak Chayangkoon
title Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
title_short Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
title_full Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
title_fullStr Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
title_full_unstemmed Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
title_sort narcotic-related tweet classification in asia using sentence vector of word embedding with feature extension
publisher Khon Kaen University
series Engineering and Applied Science Research
issn 2539-6161
2539-6218
publishDate 2021-07-01
description Currently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new data preprocessing methods, Sentence Vector of Word Embedding (SVWE) and Sentence Vector of Word Embedding with Feature Extension (SWEF), are introduced to prepare data for the NTCM. The proposed data preprocessing method uses the reduction of the dataset to produce an SVWE. Word embedding is generated by deep neural networks using the skip-gram model. The authors further extended some features to SVWE to produce a new dataset called SWEF; these datasets were used for the dataset in the NTCM. The authors collected data with keywords related to narcotic drugs from Twitter in Asia. The authors investigated a text classification model using a Support Vector Machine, Logistic Regression, a Decision Tree, and a Convolutional Neural Network. Logistic Regression with the SWEF provided the best approach for the NTCM compared with state-of-the-art methods. The proposed NTCM showed correctness and fitness by accuracy (0.8964), F-Measure (0.895), AUC (0.949), Kappa (0.7131), MCC (0.714), and low running time performance (1.04 seconds).
topic data mining
data preprocessing
feature reduction
narcotic drug
text classification
text vectorization
word embedding
url https://ph01.tci-thaijo.org/index.php/easr/article/download/243616/166483/
work_keys_str_mv AT narongsakchayangkoon narcoticrelatedtweetclassificationinasiausingsentencevectorofwordembeddingwithfeatureextension
AT anongnartsrivihok narcoticrelatedtweetclassificationinasiausingsentencevectorofwordembeddingwithfeatureextension
_version_ 1721307958636183552