Summary: | Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.
|