A parallel algorithm for network traffic anomaly detection based on Isolation Forest
With the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications,...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2018-11-01
|
Series: | International Journal of Distributed Sensor Networks |
Online Access: | https://doi.org/10.1177/1550147718814471 |
id |
doaj-7941f0ad079b44c2abe447259ee7ef55 |
---|---|
record_format |
Article |
spelling |
doaj-7941f0ad079b44c2abe447259ee7ef552020-11-25T04:03:12ZengSAGE PublishingInternational Journal of Distributed Sensor Networks1550-14772018-11-011410.1177/1550147718814471A parallel algorithm for network traffic anomaly detection based on Isolation ForestXiaoling Tao0Yang Peng1Feng Zhao2Peichao Zhao3Yong Wang4Guangxi Cooperative Innovation Center of Cloud Computing and Big Data, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaGuangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin, ChinaWith the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications, such as malware detection, load balancing, network intrusion detection. Although there are many methods around for network traffic anomaly detection, they are all designed for single machine, failing to deal with the case that the network traffic data are so large that it is prohibitive for a single computer to store and process the data. To solve these problems, we propose a parallel algorithm based on Isolation Forest and Spark for network traffic anomaly detection. We combine the advantages of Isolation Forest algorithm in network traffic anomaly detection and big data processing capability of Spark technology. Meanwhile, we apply the idea of parallelization to the process of modeling and evaluation. In the calculation process, by assigning tasks to multiple compute nodes, Isolation Forest and Spark can efficiently perform anomaly detection and evaluation process. By this way, we can also solve the problem of computation bottleneck on single machine. Extensive experiments on real world datasets show that our Isolation Forest and Spark is efficient and scales well for anomaly detection on large network traffic data.https://doi.org/10.1177/1550147718814471 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiaoling Tao Yang Peng Feng Zhao Peichao Zhao Yong Wang |
spellingShingle |
Xiaoling Tao Yang Peng Feng Zhao Peichao Zhao Yong Wang A parallel algorithm for network traffic anomaly detection based on Isolation Forest International Journal of Distributed Sensor Networks |
author_facet |
Xiaoling Tao Yang Peng Feng Zhao Peichao Zhao Yong Wang |
author_sort |
Xiaoling Tao |
title |
A parallel algorithm for network traffic anomaly detection based on Isolation Forest |
title_short |
A parallel algorithm for network traffic anomaly detection based on Isolation Forest |
title_full |
A parallel algorithm for network traffic anomaly detection based on Isolation Forest |
title_fullStr |
A parallel algorithm for network traffic anomaly detection based on Isolation Forest |
title_full_unstemmed |
A parallel algorithm for network traffic anomaly detection based on Isolation Forest |
title_sort |
parallel algorithm for network traffic anomaly detection based on isolation forest |
publisher |
SAGE Publishing |
series |
International Journal of Distributed Sensor Networks |
issn |
1550-1477 |
publishDate |
2018-11-01 |
description |
With the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications, such as malware detection, load balancing, network intrusion detection. Although there are many methods around for network traffic anomaly detection, they are all designed for single machine, failing to deal with the case that the network traffic data are so large that it is prohibitive for a single computer to store and process the data. To solve these problems, we propose a parallel algorithm based on Isolation Forest and Spark for network traffic anomaly detection. We combine the advantages of Isolation Forest algorithm in network traffic anomaly detection and big data processing capability of Spark technology. Meanwhile, we apply the idea of parallelization to the process of modeling and evaluation. In the calculation process, by assigning tasks to multiple compute nodes, Isolation Forest and Spark can efficiently perform anomaly detection and evaluation process. By this way, we can also solve the problem of computation bottleneck on single machine. Extensive experiments on real world datasets show that our Isolation Forest and Spark is efficient and scales well for anomaly detection on large network traffic data. |
url |
https://doi.org/10.1177/1550147718814471 |
work_keys_str_mv |
AT xiaolingtao aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT yangpeng aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT fengzhao aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT peichaozhao aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT yongwang aparallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT xiaolingtao parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT yangpeng parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT fengzhao parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT peichaozhao parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest AT yongwang parallelalgorithmfornetworktrafficanomalydetectionbasedonisolationforest |
_version_ |
1724441234385141760 |