Detecting web attacks using random undersampling and ensemble learners

Abstract Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5...

Full description

Bibliographic Details
Main Authors:	Richard Zuech, John Hancock, Taghi M. Khoshgoftaar
Format:	Article
Language:	English
Published:	SpringerOpen 2021-05-01
Series:	Journal of Big Data
Subjects:	CSE-CIC-IDS2018 Intrusion Detection Web Attacks Class Imbalance Random Undersampling Ensemble Learners
Online Access:	https://doi.org/10.1186/s40537-021-00460-8

id	doaj-5346b0eb73bc4f4ebef27ec624dbf98e
record_format	Article
spelling	doaj-5346b0eb73bc4f4ebef27ec624dbf98e2021-05-30T11:51:33ZengSpringerOpenJournal of Big Data2196-11152021-05-018112010.1186/s40537-021-00460-8Detecting web attacks using random undersampling and ensemble learnersRichard Zuech0John Hancock1Taghi M. Khoshgoftaar2Florida Atlantic UniversityFlorida Atlantic UniversityFlorida Atlantic UniversityAbstract Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.https://doi.org/10.1186/s40537-021-00460-8CSE-CIC-IDS2018Intrusion DetectionWeb AttacksClass ImbalanceRandom UndersamplingEnsemble Learners
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Richard Zuech John Hancock Taghi M. Khoshgoftaar
spellingShingle	Richard Zuech John Hancock Taghi M. Khoshgoftaar Detecting web attacks using random undersampling and ensemble learners Journal of Big Data CSE-CIC-IDS2018 Intrusion Detection Web Attacks Class Imbalance Random Undersampling Ensemble Learners
author_facet	Richard Zuech John Hancock Taghi M. Khoshgoftaar
author_sort	Richard Zuech
title	Detecting web attacks using random undersampling and ensemble learners
title_short	Detecting web attacks using random undersampling and ensemble learners
title_full	Detecting web attacks using random undersampling and ensemble learners
title_fullStr	Detecting web attacks using random undersampling and ensemble learners
title_full_unstemmed	Detecting web attacks using random undersampling and ensemble learners
title_sort	detecting web attacks using random undersampling and ensemble learners
publisher	SpringerOpen
series	Journal of Big Data
issn	2196-1115
publishDate	2021-05-01
description	Abstract Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.
topic	CSE-CIC-IDS2018 Intrusion Detection Web Attacks Class Imbalance Random Undersampling Ensemble Learners
url	https://doi.org/10.1186/s40537-021-00460-8
work_keys_str_mv	AT richardzuech detectingwebattacksusingrandomundersamplingandensemblelearners AT johnhancock detectingwebattacksusingrandomundersamplingandensemblelearners AT taghimkhoshgoftaar detectingwebattacksusingrandomundersamplingandensemblelearners
_version_	1721419917665763328

Detecting web attacks using random undersampling and ensemble learners

Similar Items