AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM

Feature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and u...

Full description

Bibliographic Details
Main Authors: Mahdieh Danandeh Oskouei, Seyed Naser Razavi
Format: Article
Language:English
Published: UKM Press 2018-12-01
Series:Asia-Pacific Journal of Information Technology and Multimedia
Subjects:
Online Access:https://www.ukm.my/apjitm/view.php?id=28
id doaj-5039225e40fe44c0b582c52c9bfc91a9
record_format Article
spelling doaj-5039225e40fe44c0b582c52c9bfc91a92021-06-21T07:02:46ZengUKM PressAsia-Pacific Journal of Information Technology and Multimedia2289-21922018-12-0170299113https://doi.org/10.17576/apjitm-2018-0702-08AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAMMahdieh Danandeh OskoueiSeyed Naser RazaviFeature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and used algorithms are important in terms of ranking. Web spam is an illegal method to increase mendacious rank of internet pages by deceiving the algorithms of search engines, so it is essential to use an efficient method. Up to now, many methods have been proposed to face with web spam. An ensemble feature selection method has been proposed in this paper to detect web spam. Content features of standard dataset of WEBSPAM-UK2007 are used for evaluation. Bayes network classifier is used along with 70-30% training-testing spilt of dataset. The presented results show that AUC of this method is higher than the other methods reported in this paper. Moreover, the best values of evaluation metrics in our proposed method are optimal in comparison to the other methods reported in this paper. In addition, it improves classification metrics in comparison to basic feature selection methods.https://www.ukm.my/apjitm/view.php?id=28ensemble feature selection; web spam; ranking; machine learning.
collection DOAJ
language English
format Article
sources DOAJ
author Mahdieh Danandeh Oskouei
Seyed Naser Razavi
spellingShingle Mahdieh Danandeh Oskouei
Seyed Naser Razavi
AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM
Asia-Pacific Journal of Information Technology and Multimedia
ensemble feature selection; web spam; ranking; machine learning.
author_facet Mahdieh Danandeh Oskouei
Seyed Naser Razavi
author_sort Mahdieh Danandeh Oskouei
title AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM
title_short AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM
title_full AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM
title_fullStr AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM
title_full_unstemmed AN ENSEMBLE FEATURE SELECTION METHOD TO DETECT WEB SPAM
title_sort ensemble feature selection method to detect web spam
publisher UKM Press
series Asia-Pacific Journal of Information Technology and Multimedia
issn 2289-2192
publishDate 2018-12-01
description Feature selection is an important issue in data mining, and it is used to reduce dimensions of features set. Web spam detection is one of research fields of data mining. With regard to increasing available information in virtual space and the need of users to search, the role of search engines and used algorithms are important in terms of ranking. Web spam is an illegal method to increase mendacious rank of internet pages by deceiving the algorithms of search engines, so it is essential to use an efficient method. Up to now, many methods have been proposed to face with web spam. An ensemble feature selection method has been proposed in this paper to detect web spam. Content features of standard dataset of WEBSPAM-UK2007 are used for evaluation. Bayes network classifier is used along with 70-30% training-testing spilt of dataset. The presented results show that AUC of this method is higher than the other methods reported in this paper. Moreover, the best values of evaluation metrics in our proposed method are optimal in comparison to the other methods reported in this paper. In addition, it improves classification metrics in comparison to basic feature selection methods.
topic ensemble feature selection; web spam; ranking; machine learning.
url https://www.ukm.my/apjitm/view.php?id=28
work_keys_str_mv AT mahdiehdanandehoskouei anensemblefeatureselectionmethodtodetectwebspam
AT seyednaserrazavi anensemblefeatureselectionmethodtodetectwebspam
AT mahdiehdanandehoskouei ensemblefeatureselectionmethodtodetectwebspam
AT seyednaserrazavi ensemblefeatureselectionmethodtodetectwebspam
_version_ 1721368741590073344