Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction

Abstract Security vulnerability prediction (SVP) can construct models to identify potentially vulnerable program modules via machine learning. Two kinds of features from different points of view are used to measure the extracted modules in previous studies. One kind considers traditional software me...

Full description

Bibliographic Details
Main Authors:	Xiang Chen, Zhidan Yuan, Zhanqi Cui, Dun Zhang, Xiaolin Ju
Format:	Article
Language:	English
Published:	Wiley 2021-02-01
Series:	IET Software
Online Access:	https://doi.org/10.1049/sfw2.12006

id	doaj-f32ae304c4c7476486cf615ef47df5ad
record_format	Article
spelling	doaj-f32ae304c4c7476486cf615ef47df5ad2021-08-02T08:25:07ZengWileyIET Software1751-88061751-88142021-02-01151758910.1049/sfw2.12006Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability predictionXiang Chen0Zhidan Yuan1Zhanqi Cui2Dun Zhang3Xiaolin Ju4School of Information Science and Technology Nantong University Nantong ChinaSchool of Information Science and Technology Nantong University Nantong ChinaComputer School Beijing Information Science and Technology University Beijing ChinaSchool of Information Science and Technology Nantong University Nantong ChinaSchool of Information Science and Technology Nantong University Nantong ChinaAbstract Security vulnerability prediction (SVP) can construct models to identify potentially vulnerable program modules via machine learning. Two kinds of features from different points of view are used to measure the extracted modules in previous studies. One kind considers traditional software metrics as features, and the other kind uses text mining to extract term vectors as features. Therefore, gathered SVP data sets often have numerous features and result in the curse of dimensionality. In this article, we mainly investigate the impact of filter‐based ranking feature selection (FRFS) methods on SVP, since other types of feature selection methods have too much computational cost. In empirical studies, we first consider three real‐world large‐scale web applications. Then we consider seven methods from three FRFS categories for FRFS and use a random forest classifier to construct SVP models. Final results show that given the similar code inspection cost, using FRFS can improve the performance of SVP when compared with state‐of‐the‐art baselines. Moreover, we use McNemar's test to perform diversity analysis on identified vulnerable modules by using different FRFS methods, and we are surprised to find that almost all the FRFS methods can identify similar vulnerable modules via diversity analysis.https://doi.org/10.1049/sfw2.12006
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xiang Chen Zhidan Yuan Zhanqi Cui Dun Zhang Xiaolin Ju
spellingShingle	Xiang Chen Zhidan Yuan Zhanqi Cui Dun Zhang Xiaolin Ju Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction IET Software
author_facet	Xiang Chen Zhidan Yuan Zhanqi Cui Dun Zhang Xiaolin Ju
author_sort	Xiang Chen
title	Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction
title_short	Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction
title_full	Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction
title_fullStr	Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction
title_full_unstemmed	Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction
title_sort	empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction
publisher	Wiley
series	IET Software
issn	1751-8806 1751-8814
publishDate	2021-02-01
description	Abstract Security vulnerability prediction (SVP) can construct models to identify potentially vulnerable program modules via machine learning. Two kinds of features from different points of view are used to measure the extracted modules in previous studies. One kind considers traditional software metrics as features, and the other kind uses text mining to extract term vectors as features. Therefore, gathered SVP data sets often have numerous features and result in the curse of dimensionality. In this article, we mainly investigate the impact of filter‐based ranking feature selection (FRFS) methods on SVP, since other types of feature selection methods have too much computational cost. In empirical studies, we first consider three real‐world large‐scale web applications. Then we consider seven methods from three FRFS categories for FRFS and use a random forest classifier to construct SVP models. Final results show that given the similar code inspection cost, using FRFS can improve the performance of SVP when compared with state‐of‐the‐art baselines. Moreover, we use McNemar's test to perform diversity analysis on identified vulnerable modules by using different FRFS methods, and we are surprised to find that almost all the FRFS methods can identify similar vulnerable modules via diversity analysis.
url	https://doi.org/10.1049/sfw2.12006
work_keys_str_mv	AT xiangchen empiricalstudiesontheimpactoffilterbasedrankingfeatureselectiononsecurityvulnerabilityprediction AT zhidanyuan empiricalstudiesontheimpactoffilterbasedrankingfeatureselectiononsecurityvulnerabilityprediction AT zhanqicui empiricalstudiesontheimpactoffilterbasedrankingfeatureselectiononsecurityvulnerabilityprediction AT dunzhang empiricalstudiesontheimpactoffilterbasedrankingfeatureselectiononsecurityvulnerabilityprediction AT xiaolinju empiricalstudiesontheimpactoffilterbasedrankingfeatureselectiononsecurityvulnerabilityprediction
_version_	1721238444998393856

Empirical studies on the impact of filter‐based ranking feature selection on security vulnerability prediction

Similar Items