CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection
Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9123904/ |
id |
doaj-465fc0081ac34463a28b0d9caf3733a7 |
---|---|
record_format |
Article |
spelling |
doaj-465fc0081ac34463a28b0d9caf3733a72021-03-30T02:26:32ZengIEEEIEEE Access2169-35362020-01-01811621611622510.1109/ACCESS.2020.30046999123904CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly DetectionJiewen Mao0https://orcid.org/0000-0002-7533-7885Yongquan Hu1Dong Jiang2Tongquan Wei3Fuke Shen4School of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaSchool of Computer Science and Technology, East China Normal University, Shanghai, ChinaNetwork traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.https://ieeexplore.ieee.org/document/9123904/Feature selectionclusteringinformation gainclassificationdecision treeintrusion detection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jiewen Mao Yongquan Hu Dong Jiang Tongquan Wei Fuke Shen |
spellingShingle |
Jiewen Mao Yongquan Hu Dong Jiang Tongquan Wei Fuke Shen CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection IEEE Access Feature selection clustering information gain classification decision tree intrusion detection |
author_facet |
Jiewen Mao Yongquan Hu Dong Jiang Tongquan Wei Fuke Shen |
author_sort |
Jiewen Mao |
title |
CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection |
title_short |
CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection |
title_full |
CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection |
title_fullStr |
CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection |
title_full_unstemmed |
CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection |
title_sort |
cbfs: a clustering-based feature selection mechanism for network anomaly detection |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods. |
topic |
Feature selection clustering information gain classification decision tree intrusion detection |
url |
https://ieeexplore.ieee.org/document/9123904/ |
work_keys_str_mv |
AT jiewenmao cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT yongquanhu cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT dongjiang cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT tongquanwei cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection AT fukeshen cbfsaclusteringbasedfeatureselectionmechanismfornetworkanomalydetection |
_version_ |
1724185174171713536 |