Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection

The KDD CUP 1999 intrusion detection dataset was introduced at the third international knowledge discovery and data mining tools competition, and it has been widely used for many studies. The attack types of KDD CUP 1999 dataset are divided into four categories: user to root (U2R), remote to local (...

Full description

Bibliographic Details
Main Authors:	Jae-Hyun Seo, Yong-Hyuk Kim
Format:	Article
Language:	English
Published:	Hindawi Limited 2018-01-01
Series:	Computational Intelligence and Neuroscience
Online Access:	http://dx.doi.org/10.1155/2018/9704672

id	doaj-322dc3e5561b44cc84909526ee610aab
record_format	Article
spelling	doaj-322dc3e5561b44cc84909526ee610aab2020-11-24T22:03:06ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732018-01-01201810.1155/2018/97046729704672Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion DetectionJae-Hyun Seo0Yong-Hyuk Kim1Department of Computer Science and Engineering, Wonkwang University, 460 Iksandae-ro, Iksan-si, Jeonbuk 54649, Republic of KoreaSchool of Software, Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Republic of KoreaThe KDD CUP 1999 intrusion detection dataset was introduced at the third international knowledge discovery and data mining tools competition, and it has been widely used for many studies. The attack types of KDD CUP 1999 dataset are divided into four categories: user to root (U2R), remote to local (R2L), denial of service (DoS), and Probe. We use five classes by adding the normal class. We define the U2R, R2L, and Probe classes, which are each less than 1% of the total dataset, as rare classes. In this study, we attempt to mitigate the class imbalance of the dataset. Using the synthetic minority oversampling technique (SMOTE), we attempted to optimize the SMOTE ratios for the rare classes (U2R, R2L, and Probe). After randomly generating a number of tuples of SMOTE ratios, these tuples were used to create a numerical model for optimizing the SMOTE ratios of the rare classes. The support vector regression was used to create the model. We assigned each instance in the test dataset to the model and chose the best SMOTE ratios. The experiments using machine-learning techniques were conducted using the best ratios. The results using the proposed method were significantly better than those of previous approach and other related work.http://dx.doi.org/10.1155/2018/9704672
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jae-Hyun Seo Yong-Hyuk Kim
spellingShingle	Jae-Hyun Seo Yong-Hyuk Kim Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection Computational Intelligence and Neuroscience
author_facet	Jae-Hyun Seo Yong-Hyuk Kim
author_sort	Jae-Hyun Seo
title	Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection
title_short	Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection
title_full	Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection
title_fullStr	Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection
title_full_unstemmed	Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection
title_sort	machine-learning approach to optimize smote ratio in class imbalance dataset for intrusion detection
publisher	Hindawi Limited
series	Computational Intelligence and Neuroscience
issn	1687-5265 1687-5273
publishDate	2018-01-01
description	The KDD CUP 1999 intrusion detection dataset was introduced at the third international knowledge discovery and data mining tools competition, and it has been widely used for many studies. The attack types of KDD CUP 1999 dataset are divided into four categories: user to root (U2R), remote to local (R2L), denial of service (DoS), and Probe. We use five classes by adding the normal class. We define the U2R, R2L, and Probe classes, which are each less than 1% of the total dataset, as rare classes. In this study, we attempt to mitigate the class imbalance of the dataset. Using the synthetic minority oversampling technique (SMOTE), we attempted to optimize the SMOTE ratios for the rare classes (U2R, R2L, and Probe). After randomly generating a number of tuples of SMOTE ratios, these tuples were used to create a numerical model for optimizing the SMOTE ratios of the rare classes. The support vector regression was used to create the model. We assigned each instance in the test dataset to the model and chose the best SMOTE ratios. The experiments using machine-learning techniques were conducted using the best ratios. The results using the proposed method were significantly better than those of previous approach and other related work.
url	http://dx.doi.org/10.1155/2018/9704672
work_keys_str_mv	AT jaehyunseo machinelearningapproachtooptimizesmoteratioinclassimbalancedatasetforintrusiondetection AT yonghyukkim machinelearningapproachtooptimizesmoteratioinclassimbalancedatasetforintrusiondetection
_version_	1725833256737177600

Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection

Similar Items