An Effective Identification Technology for Online News Comment Spammers in Internet Media

With the development of mobile Internet, it is changing the way we communicate with others. Internet media have gradually become the main mobile crowdsourcing applications for information dissemination and user communication, including online news and social networks. However, the potential business...

Full description

Bibliographic Details
Main Authors: Huayou Si, Wen Sun, Jilin Zhang, Jian Wan, Neal N. Xiong, Li Zhou, Yongjian Ren
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8648390/
id doaj-12fabd1476f0439290c533e9939aa9e8
record_format Article
spelling doaj-12fabd1476f0439290c533e9939aa9e82021-04-05T17:00:19ZengIEEEIEEE Access2169-35362019-01-017377923780610.1109/ACCESS.2019.29004748648390An Effective Identification Technology for Online News Comment Spammers in Internet MediaHuayou Si0https://orcid.org/0000-0002-8022-923XWen Sun1Jilin Zhang2Jian Wan3Neal N. Xiong4https://orcid.org/0000-0002-0394-4635Li Zhou5Yongjian Ren6School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaCollege of Intelligence and Computing, Tianjin University, Tianjin, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaSchool of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, ChinaWith the development of mobile Internet, it is changing the way we communicate with others. Internet media have gradually become the main mobile crowdsourcing applications for information dissemination and user communication, including online news and social networks. However, the potential business opportunities have stimulated the emergence of a large number of spammers, who release false speech, advertisements, pornographic contents, and phishing websites on the media to gain commercial benefits, which seriously affects the experience of normal users. Therefore, in order to reduce the harm of false information, the research on the identification technology of spammers has been carried out extensively. However, the traditional technologies of identifying spammers involve high data costs and poor effects, and most of them are concentrated in the field of social networks, while less research is carried out in the field of online news. In this paper, we propose an effective technology of identifying online news comment spammers based on the label propagation algorithm (LPA), making full use of the user comment behaviors and contents. First of all, we collect a large amount of news and comments from NetEase News and label some users in the data as spammers or normal users manually to construct a labeled dataset. Then, a set of behavioral and semantic features are extracted and quantified from the user comment behaviors and comment contents by statistical analysis. Next, we propose the identification technology based on the LPA. Finally, the set of feature values is input into the proposed technology in different combinations, and experiments and evaluations are carried out to determine the most effective combination of features and improve the technology. The results show that the technology proposed in this paper involves a lower data cost but a better identification effect than some traditional technologies based on the supervised classifier.https://ieeexplore.ieee.org/document/8648390/Spammer identificationInternet mediaonline news commentlabel propagation algorithmmobile crowdsourcing applications
collection DOAJ
language English
format Article
sources DOAJ
author Huayou Si
Wen Sun
Jilin Zhang
Jian Wan
Neal N. Xiong
Li Zhou
Yongjian Ren
spellingShingle Huayou Si
Wen Sun
Jilin Zhang
Jian Wan
Neal N. Xiong
Li Zhou
Yongjian Ren
An Effective Identification Technology for Online News Comment Spammers in Internet Media
IEEE Access
Spammer identification
Internet media
online news comment
label propagation algorithm
mobile crowdsourcing applications
author_facet Huayou Si
Wen Sun
Jilin Zhang
Jian Wan
Neal N. Xiong
Li Zhou
Yongjian Ren
author_sort Huayou Si
title An Effective Identification Technology for Online News Comment Spammers in Internet Media
title_short An Effective Identification Technology for Online News Comment Spammers in Internet Media
title_full An Effective Identification Technology for Online News Comment Spammers in Internet Media
title_fullStr An Effective Identification Technology for Online News Comment Spammers in Internet Media
title_full_unstemmed An Effective Identification Technology for Online News Comment Spammers in Internet Media
title_sort effective identification technology for online news comment spammers in internet media
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description With the development of mobile Internet, it is changing the way we communicate with others. Internet media have gradually become the main mobile crowdsourcing applications for information dissemination and user communication, including online news and social networks. However, the potential business opportunities have stimulated the emergence of a large number of spammers, who release false speech, advertisements, pornographic contents, and phishing websites on the media to gain commercial benefits, which seriously affects the experience of normal users. Therefore, in order to reduce the harm of false information, the research on the identification technology of spammers has been carried out extensively. However, the traditional technologies of identifying spammers involve high data costs and poor effects, and most of them are concentrated in the field of social networks, while less research is carried out in the field of online news. In this paper, we propose an effective technology of identifying online news comment spammers based on the label propagation algorithm (LPA), making full use of the user comment behaviors and contents. First of all, we collect a large amount of news and comments from NetEase News and label some users in the data as spammers or normal users manually to construct a labeled dataset. Then, a set of behavioral and semantic features are extracted and quantified from the user comment behaviors and comment contents by statistical analysis. Next, we propose the identification technology based on the LPA. Finally, the set of feature values is input into the proposed technology in different combinations, and experiments and evaluations are carried out to determine the most effective combination of features and improve the technology. The results show that the technology proposed in this paper involves a lower data cost but a better identification effect than some traditional technologies based on the supervised classifier.
topic Spammer identification
Internet media
online news comment
label propagation algorithm
mobile crowdsourcing applications
url https://ieeexplore.ieee.org/document/8648390/
work_keys_str_mv AT huayousi aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT wensun aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT jilinzhang aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT jianwan aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT nealnxiong aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT lizhou aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT yongjianren aneffectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT huayousi effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT wensun effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT jilinzhang effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT jianwan effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT nealnxiong effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT lizhou effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
AT yongjianren effectiveidentificationtechnologyforonlinenewscommentspammersininternetmedia
_version_ 1721540443844378624