Content Noise Detection Model Using Deep Learning in Web Forums

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data...

Full description

Bibliographic Details
Main Authors:	Jiyoung Woo, Jaeseok Yun
Format:	Article
Language:	English
Published:	MDPI AG 2020-06-01
Series:	Sustainability
Subjects:	web forum social media content noise posting quality text mining deep learning
Online Access:	https://www.mdpi.com/2071-1050/12/12/5074

id	doaj-2da6fd0e5269498faf5ce290f797b211
record_format	Article
spelling	doaj-2da6fd0e5269498faf5ce290f797b2112020-11-25T02:31:20ZengMDPI AGSustainability2071-10502020-06-01125074507410.3390/su12125074Content Noise Detection Model Using Deep Learning in Web ForumsJiyoung Woo0Jaeseok Yun1Department of Big Data Engineering, Soonchunhyang University, Asan-si 31538, KoreaDepartment of Internet of Things, Soonchunhyang University, Asan-si 31538, KoreaSpam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.https://www.mdpi.com/2071-1050/12/12/5074web forumsocial mediacontent noiseposting qualitytext miningdeep learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jiyoung Woo Jaeseok Yun
spellingShingle	Jiyoung Woo Jaeseok Yun Content Noise Detection Model Using Deep Learning in Web Forums Sustainability web forum social media content noise posting quality text mining deep learning
author_facet	Jiyoung Woo Jaeseok Yun
author_sort	Jiyoung Woo
title	Content Noise Detection Model Using Deep Learning in Web Forums
title_short	Content Noise Detection Model Using Deep Learning in Web Forums
title_full	Content Noise Detection Model Using Deep Learning in Web Forums
title_fullStr	Content Noise Detection Model Using Deep Learning in Web Forums
title_full_unstemmed	Content Noise Detection Model Using Deep Learning in Web Forums
title_sort	content noise detection model using deep learning in web forums
publisher	MDPI AG
series	Sustainability
issn	2071-1050
publishDate	2020-06-01
description	Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.
topic	web forum social media content noise posting quality text mining deep learning
url	https://www.mdpi.com/2071-1050/12/12/5074
work_keys_str_mv	AT jiyoungwoo contentnoisedetectionmodelusingdeeplearninginwebforums AT jaeseokyun contentnoisedetectionmodelusingdeeplearninginwebforums
_version_	1724825316304617472

Content Noise Detection Model Using Deep Learning in Web Forums

Similar Items