Detecting web attacks using random undersampling and ensemble learners
Abstract Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2021-05-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-021-00460-8 |
id |
doaj-5346b0eb73bc4f4ebef27ec624dbf98e |
---|---|
record_format |
Article |
spelling |
doaj-5346b0eb73bc4f4ebef27ec624dbf98e2021-05-30T11:51:33ZengSpringerOpenJournal of Big Data2196-11152021-05-018112010.1186/s40537-021-00460-8Detecting web attacks using random undersampling and ensemble learnersRichard Zuech0John Hancock1Taghi M. Khoshgoftaar2Florida Atlantic UniversityFlorida Atlantic UniversityFlorida Atlantic UniversityAbstract Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.https://doi.org/10.1186/s40537-021-00460-8CSE-CIC-IDS2018Intrusion DetectionWeb AttacksClass ImbalanceRandom UndersamplingEnsemble Learners |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Richard Zuech John Hancock Taghi M. Khoshgoftaar |
spellingShingle |
Richard Zuech John Hancock Taghi M. Khoshgoftaar Detecting web attacks using random undersampling and ensemble learners Journal of Big Data CSE-CIC-IDS2018 Intrusion Detection Web Attacks Class Imbalance Random Undersampling Ensemble Learners |
author_facet |
Richard Zuech John Hancock Taghi M. Khoshgoftaar |
author_sort |
Richard Zuech |
title |
Detecting web attacks using random undersampling and ensemble learners |
title_short |
Detecting web attacks using random undersampling and ensemble learners |
title_full |
Detecting web attacks using random undersampling and ensemble learners |
title_fullStr |
Detecting web attacks using random undersampling and ensemble learners |
title_full_unstemmed |
Detecting web attacks using random undersampling and ensemble learners |
title_sort |
detecting web attacks using random undersampling and ensemble learners |
publisher |
SpringerOpen |
series |
Journal of Big Data |
issn |
2196-1115 |
publishDate |
2021-05-01 |
description |
Abstract Class imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios. |
topic |
CSE-CIC-IDS2018 Intrusion Detection Web Attacks Class Imbalance Random Undersampling Ensemble Learners |
url |
https://doi.org/10.1186/s40537-021-00460-8 |
work_keys_str_mv |
AT richardzuech detectingwebattacksusingrandomundersamplingandensemblelearners AT johnhancock detectingwebattacksusingrandomundersamplingandensemblelearners AT taghimkhoshgoftaar detectingwebattacksusingrandomundersamplingandensemblelearners |
_version_ |
1721419917665763328 |