Data Mining Technology Application in False Text Information Recognition

False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real d...

Full description

Bibliographic Details
Main Authors: Jie Wan, Xue Cao, Kun Yao, Donghui Yang, E. Peng, Yong Cao
Format: Article
Language:English
Published: Hindawi Limited 2021-01-01
Series:Mobile Information Systems
Online Access:http://dx.doi.org/10.1155/2021/4206424
id doaj-9d31536324514fd7a3a9f50505b9242d
record_format Article
spelling doaj-9d31536324514fd7a3a9f50505b9242d2021-07-02T13:40:51ZengHindawi LimitedMobile Information Systems1875-905X2021-01-01202110.1155/2021/4206424Data Mining Technology Application in False Text Information RecognitionJie Wan0Xue Cao1Kun Yao2Donghui Yang3E. Peng4Yong Cao5Fundamental Space Science Research CenterSchool of Economics and ManagementDepartment of Mechanical Engineering & AutomationSchool of Economics and ManagementFundamental Space Science Research CenterDepartment of Mechanical Engineering & AutomationFalse information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.http://dx.doi.org/10.1155/2021/4206424
collection DOAJ
language English
format Article
sources DOAJ
author Jie Wan
Xue Cao
Kun Yao
Donghui Yang
E. Peng
Yong Cao
spellingShingle Jie Wan
Xue Cao
Kun Yao
Donghui Yang
E. Peng
Yong Cao
Data Mining Technology Application in False Text Information Recognition
Mobile Information Systems
author_facet Jie Wan
Xue Cao
Kun Yao
Donghui Yang
E. Peng
Yong Cao
author_sort Jie Wan
title Data Mining Technology Application in False Text Information Recognition
title_short Data Mining Technology Application in False Text Information Recognition
title_full Data Mining Technology Application in False Text Information Recognition
title_fullStr Data Mining Technology Application in False Text Information Recognition
title_full_unstemmed Data Mining Technology Application in False Text Information Recognition
title_sort data mining technology application in false text information recognition
publisher Hindawi Limited
series Mobile Information Systems
issn 1875-905X
publishDate 2021-01-01
description False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.
url http://dx.doi.org/10.1155/2021/4206424
work_keys_str_mv AT jiewan dataminingtechnologyapplicationinfalsetextinformationrecognition
AT xuecao dataminingtechnologyapplicationinfalsetextinformationrecognition
AT kunyao dataminingtechnologyapplicationinfalsetextinformationrecognition
AT donghuiyang dataminingtechnologyapplicationinfalsetextinformationrecognition
AT epeng dataminingtechnologyapplicationinfalsetextinformationrecognition
AT yongcao dataminingtechnologyapplicationinfalsetextinformationrecognition
_version_ 1721328895218679808