FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.

In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick ex...

Full description

Bibliographic Details
Main Authors:	Yong Fang, Yongcheng Liu, Cheng Huang, Liang Liu
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0228439

id	doaj-8b4d34e2cf284e5fba91bde7b1127bb9
record_format	Article
spelling	doaj-8b4d34e2cf284e5fba91bde7b1127bb92021-03-03T21:28:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01152e022843910.1371/journal.pone.0228439FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.Yong FangYongcheng LiuCheng HuangLiang LiuIn recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively.https://doi.org/10.1371/journal.pone.0228439
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yong Fang Yongcheng Liu Cheng Huang Liang Liu
spellingShingle	Yong Fang Yongcheng Liu Cheng Huang Liang Liu FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. PLoS ONE
author_facet	Yong Fang Yongcheng Liu Cheng Huang Liang Liu
author_sort	Yong Fang
title	FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.
title_short	FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.
title_full	FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.
title_fullStr	FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.
title_full_unstemmed	FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.
title_sort	fastembed: predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2020-01-01
description	In recent years, the number of vulnerabilities discovered and publicly disclosed has shown a sharp upward trend. However, the value of exploitation of vulnerabilities varies for attackers, considering that only a small fraction of vulnerabilities are exploited. Therefore, the realization of quick exclusion of the non-exploitable vulnerabilities and optimal patch prioritization on limited resources has become imperative for organizations. Recent works using machine learning techniques predict exploited vulnerabilities by extracting features from open-source intelligence (OSINT). However, in the face of explosive growth of vulnerability information, there is room for improvement in the application of past methods to multiple threat intelligence. A more general method is needed to deal with various threat intelligence sources. Moreover, in previous methods, traditional text processing methods were used to deal with vulnerability related descriptions, which only grasped the static statistical characteristics but ignored the context and the meaning of the words of the text. To address these challenges, we propose an exploit prediction model, which is based on a combination of fastText and LightGBM algorithm and called fastEmbed. We replicate key portions of the state-of-the-art work of exploit prediction and use them as benchmark models. Our model outperforms the baseline model whether in terms of the generalization ability or the prediction ability without temporal intermixing with an average overall improvement of 6.283% by learning the embedding of vulnerability-related text on extremely imbalanced data sets. Besides, in terms of predicting the exploits in the wild, our model also outperforms the baseline model with an F1 measure of 0.586 on the minority class (33.577% improvement over the work using features from darkweb/deepweb). The results demonstrate that the model can improve the ability to describe the exploitability of vulnerabilities and predict exploits in the wild effectively.
url	https://doi.org/10.1371/journal.pone.0228439
work_keys_str_mv	AT yongfang fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm AT yongchengliu fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm AT chenghuang fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm AT liangliu fastembedpredictingvulnerabilityexploitationpossibilitybasedonensemblemachinelearningalgorithm
_version_	1714816630928703488

FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm.

Similar Items