Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension
Currently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Khon Kaen University
2021-07-01
|
Series: | Engineering and Applied Science Research |
Subjects: | |
Online Access: | https://ph01.tci-thaijo.org/index.php/easr/article/download/243616/166483/ |
id |
doaj-ebde780b1f5442c282801bbb7081e45f |
---|---|
record_format |
Article |
spelling |
doaj-ebde780b1f5442c282801bbb7081e45f2021-07-12T04:08:33ZengKhon Kaen UniversityEngineering and Applied Science Research2539-61612539-62182021-07-01485547559Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extensionNarongsak ChayangkoonAnongnart SrivihokCurrently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new data preprocessing methods, Sentence Vector of Word Embedding (SVWE) and Sentence Vector of Word Embedding with Feature Extension (SWEF), are introduced to prepare data for the NTCM. The proposed data preprocessing method uses the reduction of the dataset to produce an SVWE. Word embedding is generated by deep neural networks using the skip-gram model. The authors further extended some features to SVWE to produce a new dataset called SWEF; these datasets were used for the dataset in the NTCM. The authors collected data with keywords related to narcotic drugs from Twitter in Asia. The authors investigated a text classification model using a Support Vector Machine, Logistic Regression, a Decision Tree, and a Convolutional Neural Network. Logistic Regression with the SWEF provided the best approach for the NTCM compared with state-of-the-art methods. The proposed NTCM showed correctness and fitness by accuracy (0.8964), F-Measure (0.895), AUC (0.949), Kappa (0.7131), MCC (0.714), and low running time performance (1.04 seconds).https://ph01.tci-thaijo.org/index.php/easr/article/download/243616/166483/data miningdata preprocessingfeature reductionnarcotic drugtext classificationtext vectorizationword embedding |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Narongsak Chayangkoon Anongnart Srivihok |
spellingShingle |
Narongsak Chayangkoon Anongnart Srivihok Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension Engineering and Applied Science Research data mining data preprocessing feature reduction narcotic drug text classification text vectorization word embedding |
author_facet |
Narongsak Chayangkoon Anongnart Srivihok |
author_sort |
Narongsak Chayangkoon |
title |
Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension |
title_short |
Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension |
title_full |
Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension |
title_fullStr |
Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension |
title_full_unstemmed |
Narcotic-related tweet classification in Asia using sentence vector of word embedding with feature extension |
title_sort |
narcotic-related tweet classification in asia using sentence vector of word embedding with feature extension |
publisher |
Khon Kaen University |
series |
Engineering and Applied Science Research |
issn |
2539-6161 2539-6218 |
publishDate |
2021-07-01 |
description |
Currently, Asia faces a narcotic drug addiction problem. In social networking services, such as Twitter, some drug addicted users converse about behaviours related to narcotic drugs. This research proposes a new Narcotic-related Tweet Classification Model (NTCM) that uses data preprocessing. Two new data preprocessing methods, Sentence Vector of Word Embedding (SVWE) and Sentence Vector of Word Embedding with Feature Extension (SWEF), are introduced to prepare data for the NTCM. The proposed data preprocessing method uses the reduction of the dataset to produce an SVWE. Word embedding is generated by deep neural networks using the skip-gram model. The authors further extended some features to SVWE to produce a new dataset called SWEF; these datasets were used for the dataset in the NTCM. The authors collected data with keywords related to narcotic drugs from Twitter in Asia. The authors investigated a text classification model using a Support Vector Machine, Logistic Regression, a Decision Tree, and a Convolutional Neural Network. Logistic Regression with the SWEF provided the best approach for the NTCM compared with state-of-the-art methods. The proposed NTCM showed correctness and fitness by accuracy (0.8964), F-Measure (0.895), AUC (0.949), Kappa (0.7131), MCC (0.714), and low running time performance (1.04 seconds). |
topic |
data mining data preprocessing feature reduction narcotic drug text classification text vectorization word embedding |
url |
https://ph01.tci-thaijo.org/index.php/easr/article/download/243616/166483/ |
work_keys_str_mv |
AT narongsakchayangkoon narcoticrelatedtweetclassificationinasiausingsentencevectorofwordembeddingwithfeatureextension AT anongnartsrivihok narcoticrelatedtweetclassificationinasiausingsentencevectorofwordembeddingwithfeatureextension |
_version_ |
1721307958636183552 |