Tripartite Active Learning for Interactive Anomaly Discovery

Most existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human expe...

Full description

Bibliographic Details
Main Authors: Yanqiao Zhu, Kai Yang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8707963/
id doaj-5923a0b4c26c48b08ec4bd1b1c352081
record_format Article
spelling doaj-5923a0b4c26c48b08ec4bd1b1c3520812021-03-29T22:56:20ZengIEEEIEEE Access2169-35362019-01-017631956320310.1109/ACCESS.2019.29153888707963Tripartite Active Learning for Interactive Anomaly DiscoveryYanqiao Zhu0Kai Yang1https://orcid.org/0000-0002-5983-198XSchool of Software Engineering, Tongji University, Shanghai, ChinaDepartment of Computer Science, Tongji University, Shanghai, ChinaMost existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human experts is therefore indispensable in building predictive models in such applications. However, obtaining labels from human experts is time-consuming and expensive. Obtaining labels from nonexpert labelers are relatively easy and cost-effective. However, the labeling accuracy of a nonexpert is usually difficult to assess. Therefore, it remains open to leverage both the machine intelligence and the knowledge from labelers with diverse backgrounds to construct a machine learning model for domain-specific anomaly detection. To this end, this paper proposes a framework of tripartite active learning for interactive anomaly discovery in large datasets based on crowdsourced labels. This tripartite active learning method consists of two stages. In the first stage, an unsupervised learning algorithm is employed to extract statistical outliers from the dataset. This algorithm is of low computational complexity as well as memory requirement and thus well suited for large datasets. We then develop an iterative algorithm consisting of two steps. The algorithm first evaluates and trains labelers based on gold instances provided by the expert labelers. Then, it assigns the most informative samples to its most confident labeler for relabeling and update the detector based on new labels. The capacity constraints are taken into account in the active learning approach to guarantee the fair allocation of labeling instances as well as robustness against erroneous labels. It is seen through experiments that the proposed algorithm provides an effective means for interactive anomaly detection. As far as we are aware of, this is the first work that considers designing a tripartite machine learning system for domain-specific anomaly detection.https://ieeexplore.ieee.org/document/8707963/Active learninginteractive artificial intelligenceanomaly detectionlinear integer programminghuman training
collection DOAJ
language English
format Article
sources DOAJ
author Yanqiao Zhu
Kai Yang
spellingShingle Yanqiao Zhu
Kai Yang
Tripartite Active Learning for Interactive Anomaly Discovery
IEEE Access
Active learning
interactive artificial intelligence
anomaly detection
linear integer programming
human training
author_facet Yanqiao Zhu
Kai Yang
author_sort Yanqiao Zhu
title Tripartite Active Learning for Interactive Anomaly Discovery
title_short Tripartite Active Learning for Interactive Anomaly Discovery
title_full Tripartite Active Learning for Interactive Anomaly Discovery
title_fullStr Tripartite Active Learning for Interactive Anomaly Discovery
title_full_unstemmed Tripartite Active Learning for Interactive Anomaly Discovery
title_sort tripartite active learning for interactive anomaly discovery
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Most existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human experts is therefore indispensable in building predictive models in such applications. However, obtaining labels from human experts is time-consuming and expensive. Obtaining labels from nonexpert labelers are relatively easy and cost-effective. However, the labeling accuracy of a nonexpert is usually difficult to assess. Therefore, it remains open to leverage both the machine intelligence and the knowledge from labelers with diverse backgrounds to construct a machine learning model for domain-specific anomaly detection. To this end, this paper proposes a framework of tripartite active learning for interactive anomaly discovery in large datasets based on crowdsourced labels. This tripartite active learning method consists of two stages. In the first stage, an unsupervised learning algorithm is employed to extract statistical outliers from the dataset. This algorithm is of low computational complexity as well as memory requirement and thus well suited for large datasets. We then develop an iterative algorithm consisting of two steps. The algorithm first evaluates and trains labelers based on gold instances provided by the expert labelers. Then, it assigns the most informative samples to its most confident labeler for relabeling and update the detector based on new labels. The capacity constraints are taken into account in the active learning approach to guarantee the fair allocation of labeling instances as well as robustness against erroneous labels. It is seen through experiments that the proposed algorithm provides an effective means for interactive anomaly detection. As far as we are aware of, this is the first work that considers designing a tripartite machine learning system for domain-specific anomaly detection.
topic Active learning
interactive artificial intelligence
anomaly detection
linear integer programming
human training
url https://ieeexplore.ieee.org/document/8707963/
work_keys_str_mv AT yanqiaozhu tripartiteactivelearningforinteractiveanomalydiscovery
AT kaiyang tripartiteactivelearningforinteractiveanomalydiscovery
_version_ 1724190548849328128