A parallel algorithm for network traffic anomaly detection based on Isolation Forest

With the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications,...

Full description

Bibliographic Details
Main Authors: Xiaoling Tao, Yang Peng, Feng Zhao, Peichao Zhao, Yong Wang
Format: Article
Language:English
Published: SAGE Publishing 2018-11-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1177/1550147718814471
id doaj-7941f0ad079b44c2abe447259ee7ef55
record_format Article
spelling doaj-7941f0ad079b44c2abe447259ee7ef552020-11-25T04:03:12ZengSAGE PublishingInternational Journal of Distributed Sensor Networks1550-14772018-11-011410.1177/1550147718814471A parallel algorithm for network traffic anomaly detection based on Isolation ForestXiaoling Tao0Yang Peng1Feng Zhao2Peichao Zhao3Yong Wang4Guangxi Cooperative Innovation Center of Cloud Computing and Big Data, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaWith the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications, such as malware detection, load balancing, network intrusion detection. Although there are many methods around for network traffic anomaly detection, they are all designed for single machine, failing to deal with the case that the network traffic data are so large that it is prohibitive for a single computer to store and process the data. To solve these problems, we propose a parallel algorithm based on Isolation Forest and Spark for network traffic anomaly detection. We combine the advantages of Isolation Forest algorithm in network traffic anomaly detection and big data processing capability of Spark technology. Meanwhile, we apply the idea of parallelization to the process of modeling and evaluation. In the calculation process, by assigning tasks to multiple compute nodes, Isolation Forest and Spark can efficiently perform anomaly detection and evaluation process. By this way, we can also solve the problem of computation bottleneck on single machine. Extensive experiments on real world datasets show that our Isolation Forest and Spark is efficient and scales well for anomaly detection on large network traffic data.https://doi.org/10.1177/1550147718814471
collection DOAJ
language English
format Article
sources DOAJ
author Xiaoling Tao
Yang Peng
Feng Zhao
Peichao Zhao
Yong Wang
spellingShingle Xiaoling Tao
Yang Peng
Feng Zhao
Peichao Zhao
Yong Wang
A parallel algorithm for network traffic anomaly detection based on Isolation Forest
International Journal of Distributed Sensor Networks
author_facet Xiaoling Tao
Yang Peng
Feng Zhao
Peichao Zhao
Yong Wang
author_sort Xiaoling Tao
title A parallel algorithm for network traffic anomaly detection based on Isolation Forest
title_short A parallel algorithm for network traffic anomaly detection based on Isolation Forest
title_full A parallel algorithm for network traffic anomaly detection based on Isolation Forest
title_fullStr A parallel algorithm for network traffic anomaly detection based on Isolation Forest
title_full_unstemmed A parallel algorithm for network traffic anomaly detection based on Isolation Forest
title_sort parallel algorithm for network traffic anomaly detection based on isolation forest
publisher SAGE Publishing
series International Journal of Distributed Sensor Networks
issn 1550-1477
publishDate 2018-11-01
description With the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications, such as malware detection, load balancing, network intrusion detection. Although there are many methods around for network traffic anomaly detection, they are all designed for single machine, failing to deal with the case that the network traffic data are so large that it is prohibitive for a single computer to store and process the data. To solve these problems, we propose a parallel algorithm based on Isolation Forest and Spark for network traffic anomaly detection. We combine the advantages of Isolation Forest algorithm in network traffic anomaly detection and big data processing capability of Spark technology. Meanwhile, we apply the idea of parallelization to the process of modeling and evaluation. In the calculation process, by assigning tasks to multiple compute nodes, Isolation Forest and Spark can efficiently perform anomaly detection and evaluation process. By this way, we can also solve the problem of computation bottleneck on single machine. Extensive experiments on real world datasets show that our Isolation Forest and Spark is efficient and scales well for anomaly detection on large network traffic data.
url https://doi.org/10.1177/1550147718814471
work_keys_str_mv AT xiaolingtao aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT yangpeng aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT fengzhao aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT peichaozhao aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT yongwang aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT xiaolingtao parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT yangpeng parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT fengzhao parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT peichaozhao parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
AT yongwang parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest
_version_ 1724441234385141760