Data Mining Technology Application in False Text Information Recognition
False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real d...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2021-01-01
|
Series: | Mobile Information Systems |
Online Access: | http://dx.doi.org/10.1155/2021/4206424 |
id |
doaj-9d31536324514fd7a3a9f50505b9242d |
---|---|
record_format |
Article |
spelling |
doaj-9d31536324514fd7a3a9f50505b9242d2021-07-02T13:40:51ZengHindawi LimitedMobile Information Systems1875-905X2021-01-01202110.1155/2021/4206424Data Mining Technology Application in False Text Information RecognitionJie Wan0Xue Cao1Kun Yao2Donghui Yang3E. Peng4Yong Cao5Fundamental Space Science Research CenterSchool of Economics and ManagementDepartment of Mechanical Engineering & AutomationSchool of Economics and ManagementFundamental Space Science Research CenterDepartment of Mechanical Engineering & AutomationFalse information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.http://dx.doi.org/10.1155/2021/4206424 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jie Wan Xue Cao Kun Yao Donghui Yang E. Peng Yong Cao |
spellingShingle |
Jie Wan Xue Cao Kun Yao Donghui Yang E. Peng Yong Cao Data Mining Technology Application in False Text Information Recognition Mobile Information Systems |
author_facet |
Jie Wan Xue Cao Kun Yao Donghui Yang E. Peng Yong Cao |
author_sort |
Jie Wan |
title |
Data Mining Technology Application in False Text Information Recognition |
title_short |
Data Mining Technology Application in False Text Information Recognition |
title_full |
Data Mining Technology Application in False Text Information Recognition |
title_fullStr |
Data Mining Technology Application in False Text Information Recognition |
title_full_unstemmed |
Data Mining Technology Application in False Text Information Recognition |
title_sort |
data mining technology application in false text information recognition |
publisher |
Hindawi Limited |
series |
Mobile Information Systems |
issn |
1875-905X |
publishDate |
2021-01-01 |
description |
False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior. |
url |
http://dx.doi.org/10.1155/2021/4206424 |
work_keys_str_mv |
AT jiewan dataminingtechnologyapplicationinfalsetextinformationrecognition AT xuecao dataminingtechnologyapplicationinfalsetextinformationrecognition AT kunyao dataminingtechnologyapplicationinfalsetextinformationrecognition AT donghuiyang dataminingtechnologyapplicationinfalsetextinformationrecognition AT epeng dataminingtechnologyapplicationinfalsetextinformationrecognition AT yongcao dataminingtechnologyapplicationinfalsetextinformationrecognition |
_version_ |
1721328895218679808 |