CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection

Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features...

Full description

Bibliographic Details
Main Authors: Jiewen Mao, Yongquan Hu, Dong Jiang, Tongquan Wei, Fuke Shen
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9123904/
id doaj-465fc0081ac34463a28b0d9caf3733a7
record_format Article
spelling doaj-465fc0081ac34463a28b0d9caf3733a72021-03-30T02:26:32ZengIEEEIEEE Access2169-35362020-01-01811621611622510.1109/ACCESS.2020.30046999123904CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly DetectionJiewen Mao0https://orcid.org/0000-0002-7533-7885Yongquan Hu1Dong Jiang2Tongquan Wei3Fuke Shen4School of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaNetwork traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.https://ieeexplore.ieee.org/document/9123904/Feature selectionclusteringinformation gainclassificationdecision treeintrusion detection
collection DOAJ
language English
format Article
sources DOAJ
author Jiewen Mao
Yongquan Hu
Dong Jiang
Tongquan Wei
Fuke Shen
spellingShingle Jiewen Mao
Yongquan Hu
Dong Jiang
Tongquan Wei
Fuke Shen
CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
IEEE Access
Feature selection
clustering
information gain
classification
decision tree
intrusion detection
author_facet Jiewen Mao
Yongquan Hu
Dong Jiang
Tongquan Wei
Fuke Shen
author_sort Jiewen Mao
title CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_short CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_full CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_fullStr CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_full_unstemmed CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
title_sort cbfs: a clustering-based feature selection mechanism for network anomaly detection
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.
topic Feature selection
clustering
information gain
classification
decision tree
intrusion detection
url https://ieeexplore.ieee.org/document/9123904/
work_keys_str_mv AT jiewenmao cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection
AT yongquanhu cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection
AT dongjiang cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection
AT tongquanwei cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection
AT fukeshen cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection
_version_ 1724185174171713536