CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection

Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features...

Full description

Bibliographic Details
Main Authors:	Jiewen Mao, Yongquan Hu, Dong Jiang, Tongquan Wei, Fuke Shen
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Feature selection clustering information gain classification decision tree intrusion detection
Online Access:	https://ieeexplore.ieee.org/document/9123904/

id	doaj-465fc0081ac34463a28b0d9caf3733a7
record_format	Article
spelling	doaj-465fc0081ac34463a28b0d9caf3733a72021-03-30T02:26:32ZengIEEEIEEE Access2169-35362020-01-01811621611622510.1109/ACCESS.2020.30046999123904CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly DetectionJiewen Mao0https://orcid.org/0000-0002-7533-7885Yongquan Hu1Dong Jiang2Tongquan Wei3Fuke Shen4School of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaNetwork traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.https://ieeexplore.ieee.org/document/9123904/Feature selectionclusteringinformation gainclassificationdecision treeintrusion detection
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jiewen Mao Yongquan Hu Dong Jiang Tongquan Wei Fuke Shen
spellingShingle	Jiewen Mao Yongquan Hu Dong Jiang Tongquan Wei Fuke Shen CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection IEEE Access Feature selection clustering information gain classification decision tree intrusion detection
author_facet	Jiewen Mao Yongquan Hu Dong Jiang Tongquan Wei Fuke Shen
author_sort	Jiewen Mao
title	CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_short	CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_full	CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_fullStr	CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_full_unstemmed	CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_sort	cbfs: a clustering-based feature selection mechanism for network anomaly detection
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.
topic	Feature selection clustering information gain classification decision tree intrusion detection
url	https://ieeexplore.ieee.org/document/9123904/
work_keys_str_mv	AT jiewenmao cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT yongquanhu cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT dongjiang cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT tongquanwei cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT fukeshen cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection
_version_	1724185174171713536

CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection

Similar Items