Tripartite Active Learning for Interactive Anomaly Discovery

Most existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human expe...

Full description

Bibliographic Details
Main Authors:	Yanqiao Zhu, Kai Yang
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Active learning interactive artificial intelligence anomaly detection linear integer programming human training
Online Access:	https://ieeexplore.ieee.org/document/8707963/

id	doaj-5923a0b4c26c48b08ec4bd1b1c352081
record_format	Article
spelling	doaj-5923a0b4c26c48b08ec4bd1b1c3520812021-03-29T22:56:20ZengIEEEIEEE Access2169-35362019-01-017631956320310.1109/ACCESS.2019.29153888707963Tripartite Active Learning for Interactive Anomaly DiscoveryYanqiao Zhu0Kai Yang1https://orcid.org/0000-0002-5983-198XSchool of Software Engineering, Tongji University, Shanghai, ChinaDepartment of Computer Science, Tongji University, Shanghai, ChinaMost existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human experts is therefore indispensable in building predictive models in such applications. However, obtaining labels from human experts is time-consuming and expensive. Obtaining labels from nonexpert labelers are relatively easy and cost-effective. However, the labeling accuracy of a nonexpert is usually difficult to assess. Therefore, it remains open to leverage both the machine intelligence and the knowledge from labelers with diverse backgrounds to construct a machine learning model for domain-specific anomaly detection. To this end, this paper proposes a framework of tripartite active learning for interactive anomaly discovery in large datasets based on crowdsourced labels. This tripartite active learning method consists of two stages. In the first stage, an unsupervised learning algorithm is employed to extract statistical outliers from the dataset. This algorithm is of low computational complexity as well as memory requirement and thus well suited for large datasets. We then develop an iterative algorithm consisting of two steps. The algorithm first evaluates and trains labelers based on gold instances provided by the expert labelers. Then, it assigns the most informative samples to its most confident labeler for relabeling and update the detector based on new labels. The capacity constraints are taken into account in the active learning approach to guarantee the fair allocation of labeling instances as well as robustness against erroneous labels. It is seen through experiments that the proposed algorithm provides an effective means for interactive anomaly detection. As far as we are aware of, this is the first work that considers designing a tripartite machine learning system for domain-specific anomaly detection.https://ieeexplore.ieee.org/document/8707963/Active learninginteractive artificial intelligenceanomaly detectionlinear integer programminghuman training
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yanqiao Zhu Kai Yang
spellingShingle	Yanqiao Zhu Kai Yang Tripartite Active Learning for Interactive Anomaly Discovery IEEE Access Active learning interactive artificial intelligence anomaly detection linear integer programming human training
author_facet	Yanqiao Zhu Kai Yang
author_sort	Yanqiao Zhu
title	Tripartite Active Learning for Interactive Anomaly Discovery
title_short	Tripartite Active Learning for Interactive Anomaly Discovery
title_full	Tripartite Active Learning for Interactive Anomaly Discovery
title_fullStr	Tripartite Active Learning for Interactive Anomaly Discovery
title_full_unstemmed	Tripartite Active Learning for Interactive Anomaly Discovery
title_sort	tripartite active learning for interactive anomaly discovery
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Most existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human experts is therefore indispensable in building predictive models in such applications. However, obtaining labels from human experts is time-consuming and expensive. Obtaining labels from nonexpert labelers are relatively easy and cost-effective. However, the labeling accuracy of a nonexpert is usually difficult to assess. Therefore, it remains open to leverage both the machine intelligence and the knowledge from labelers with diverse backgrounds to construct a machine learning model for domain-specific anomaly detection. To this end, this paper proposes a framework of tripartite active learning for interactive anomaly discovery in large datasets based on crowdsourced labels. This tripartite active learning method consists of two stages. In the first stage, an unsupervised learning algorithm is employed to extract statistical outliers from the dataset. This algorithm is of low computational complexity as well as memory requirement and thus well suited for large datasets. We then develop an iterative algorithm consisting of two steps. The algorithm first evaluates and trains labelers based on gold instances provided by the expert labelers. Then, it assigns the most informative samples to its most confident labeler for relabeling and update the detector based on new labels. The capacity constraints are taken into account in the active learning approach to guarantee the fair allocation of labeling instances as well as robustness against erroneous labels. It is seen through experiments that the proposed algorithm provides an effective means for interactive anomaly detection. As far as we are aware of, this is the first work that considers designing a tripartite machine learning system for domain-specific anomaly detection.
topic	Active learning interactive artificial intelligence anomaly detection linear integer programming human training
url	https://ieeexplore.ieee.org/document/8707963/
work_keys_str_mv	AT yanqiaozhu tripartiteactivelearningforinteractiveanomalydiscovery AT kaiyang tripartiteactivelearningforinteractiveanomalydiscovery
_version_	1724190548849328128

Tripartite Active Learning for Interactive Anomaly Discovery

Similar Items