Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools

Anomaly detection refers to the problem of identifying abnormal behaviour within a set of measurements. In many cases, one has some statistical model for normal data, and wishes to identify whether new data fit the model or not. However, in others, while there are normal data to learn from, there is...

Full description

Bibliographic Details
Main Authors:	Shachar Siboni, Asaf Cohen
Format:	Article
Language:	English
Published:	MDPI AG 2020-06-01
Series:	Entropy
Subjects:	anomaly detection individual sequences one-dimensional time series universal compression probability assignment statistical model
Online Access:	https://www.mdpi.com/1099-4300/22/6/649

id	doaj-f3a6835bb19b4cab8e9f5606a319485e
record_format	Article
spelling	doaj-f3a6835bb19b4cab8e9f5606a319485e2020-11-25T03:14:08ZengMDPI AGEntropy1099-43002020-06-012264964910.3390/e22060649Anomaly Detection for Individual Sequences with Applications in Identifying Malicious ToolsShachar Siboni0Asaf Cohen1Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, IsraelSchool of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, IsraelAnomaly detection refers to the problem of identifying abnormal behaviour within a set of measurements. In many cases, one has some statistical model for normal data, and wishes to identify whether new data fit the model or not. However, in others, while there are normal data to learn from, there is no statistical model for this data, and there is no structured parameter set to estimate. Thus, one is forced to assume an individual sequences setup, where there is no given model or any guarantee that such a model exists. In this work, we propose a universal anomaly detection algorithm for one-dimensional time series that is able to learn the normal behaviour of systems and alert for abnormalities, without assuming anything on the normal data, or anything on the anomalies. The suggested method utilizes new information measures that were derived from the Lempel–Ziv (LZ) compression algorithm in order to optimally and efficiently learn the normal behaviour (during learning), and then estimate the likelihood of new data (during operation) and classify it accordingly. We apply the algorithm to key problems in computer security, as well as a benchmark anomaly detection data set, all using simple, single-feature time-indexed data. The first is detecting Botnets Command and Control (C&C) channels without deep inspection. We then apply it to the problems of malicious tools detection via system calls monitoring and data leakage identification.We conclude with the New York City (NYC) taxi data. Finally, while using information theoretic tools, we show that an attacker’s attempt to maliciously fool the detection system by trying to generate normal data is bound to fail, either due to a high probability of error or because of the need for huge amounts of resources.https://www.mdpi.com/1099-4300/22/6/649anomaly detectionindividual sequencesone-dimensional time seriesuniversal compressionprobability assignmentstatistical model
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shachar Siboni Asaf Cohen
spellingShingle	Shachar Siboni Asaf Cohen Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools Entropy anomaly detection individual sequences one-dimensional time series universal compression probability assignment statistical model
author_facet	Shachar Siboni Asaf Cohen
author_sort	Shachar Siboni
title	Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
title_short	Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
title_full	Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
title_fullStr	Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
title_full_unstemmed	Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools
title_sort	anomaly detection for individual sequences with applications in identifying malicious tools
publisher	MDPI AG
series	Entropy
issn	1099-4300
publishDate	2020-06-01
description	Anomaly detection refers to the problem of identifying abnormal behaviour within a set of measurements. In many cases, one has some statistical model for normal data, and wishes to identify whether new data fit the model or not. However, in others, while there are normal data to learn from, there is no statistical model for this data, and there is no structured parameter set to estimate. Thus, one is forced to assume an individual sequences setup, where there is no given model or any guarantee that such a model exists. In this work, we propose a universal anomaly detection algorithm for one-dimensional time series that is able to learn the normal behaviour of systems and alert for abnormalities, without assuming anything on the normal data, or anything on the anomalies. The suggested method utilizes new information measures that were derived from the Lempel–Ziv (LZ) compression algorithm in order to optimally and efficiently learn the normal behaviour (during learning), and then estimate the likelihood of new data (during operation) and classify it accordingly. We apply the algorithm to key problems in computer security, as well as a benchmark anomaly detection data set, all using simple, single-feature time-indexed data. The first is detecting Botnets Command and Control (C&C) channels without deep inspection. We then apply it to the problems of malicious tools detection via system calls monitoring and data leakage identification.We conclude with the New York City (NYC) taxi data. Finally, while using information theoretic tools, we show that an attacker’s attempt to maliciously fool the detection system by trying to generate normal data is bound to fail, either due to a high probability of error or because of the need for huge amounts of resources.
topic	anomaly detection individual sequences one-dimensional time series universal compression probability assignment statistical model
url	https://www.mdpi.com/1099-4300/22/6/649
work_keys_str_mv	AT shacharsiboni anomalydetectionforindividualsequenceswithapplicationsinidentifyingmalicioustools AT asafcohen anomalydetectionforindividualsequenceswithapplicationsinidentifyingmalicioustools
_version_	1724644337295294464

Anomaly Detection for Individual Sequences with Applications in Identifying Malicious Tools

Similar Items