Tripartite Active Learning for Interactive Anomaly Discovery
Most existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human expe...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8707963/ |
id |
doaj-5923a0b4c26c48b08ec4bd1b1c352081 |
---|---|
record_format |
Article |
spelling |
doaj-5923a0b4c26c48b08ec4bd1b1c3520812021-03-29T22:56:20ZengIEEEIEEE Access2169-35362019-01-017631956320310.1109/ACCESS.2019.29153888707963Tripartite Active Learning for Interactive Anomaly DiscoveryYanqiao Zhu0Kai Yang1https://orcid.org/0000-0002-5983-198XSchool of Software Engineering, Tongji University, Shanghai, ChinaDepartment of Computer Science, Tongji University, Shanghai, ChinaMost existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human experts is therefore indispensable in building predictive models in such applications. However, obtaining labels from human experts is time-consuming and expensive. Obtaining labels from nonexpert labelers are relatively easy and cost-effective. However, the labeling accuracy of a nonexpert is usually difficult to assess. Therefore, it remains open to leverage both the machine intelligence and the knowledge from labelers with diverse backgrounds to construct a machine learning model for domain-specific anomaly detection. To this end, this paper proposes a framework of tripartite active learning for interactive anomaly discovery in large datasets based on crowdsourced labels. This tripartite active learning method consists of two stages. In the first stage, an unsupervised learning algorithm is employed to extract statistical outliers from the dataset. This algorithm is of low computational complexity as well as memory requirement and thus well suited for large datasets. We then develop an iterative algorithm consisting of two steps. The algorithm first evaluates and trains labelers based on gold instances provided by the expert labelers. Then, it assigns the most informative samples to its most confident labeler for relabeling and update the detector based on new labels. The capacity constraints are taken into account in the active learning approach to guarantee the fair allocation of labeling instances as well as robustness against erroneous labels. It is seen through experiments that the proposed algorithm provides an effective means for interactive anomaly detection. As far as we are aware of, this is the first work that considers designing a tripartite machine learning system for domain-specific anomaly detection.https://ieeexplore.ieee.org/document/8707963/Active learninginteractive artificial intelligenceanomaly detectionlinear integer programminghuman training |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yanqiao Zhu Kai Yang |
spellingShingle |
Yanqiao Zhu Kai Yang Tripartite Active Learning for Interactive Anomaly Discovery IEEE Access Active learning interactive artificial intelligence anomaly detection linear integer programming human training |
author_facet |
Yanqiao Zhu Kai Yang |
author_sort |
Yanqiao Zhu |
title |
Tripartite Active Learning for Interactive Anomaly Discovery |
title_short |
Tripartite Active Learning for Interactive Anomaly Discovery |
title_full |
Tripartite Active Learning for Interactive Anomaly Discovery |
title_fullStr |
Tripartite Active Learning for Interactive Anomaly Discovery |
title_full_unstemmed |
Tripartite Active Learning for Interactive Anomaly Discovery |
title_sort |
tripartite active learning for interactive anomaly discovery |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Most existing approaches to anomaly detection focus on statistical features of the data. However, in many cases, users are merely interested in a subset of the statistical outliers depending on the specific domain of interest, e.g., network attacks or financial fraud. The instruction from human experts is therefore indispensable in building predictive models in such applications. However, obtaining labels from human experts is time-consuming and expensive. Obtaining labels from nonexpert labelers are relatively easy and cost-effective. However, the labeling accuracy of a nonexpert is usually difficult to assess. Therefore, it remains open to leverage both the machine intelligence and the knowledge from labelers with diverse backgrounds to construct a machine learning model for domain-specific anomaly detection. To this end, this paper proposes a framework of tripartite active learning for interactive anomaly discovery in large datasets based on crowdsourced labels. This tripartite active learning method consists of two stages. In the first stage, an unsupervised learning algorithm is employed to extract statistical outliers from the dataset. This algorithm is of low computational complexity as well as memory requirement and thus well suited for large datasets. We then develop an iterative algorithm consisting of two steps. The algorithm first evaluates and trains labelers based on gold instances provided by the expert labelers. Then, it assigns the most informative samples to its most confident labeler for relabeling and update the detector based on new labels. The capacity constraints are taken into account in the active learning approach to guarantee the fair allocation of labeling instances as well as robustness against erroneous labels. It is seen through experiments that the proposed algorithm provides an effective means for interactive anomaly detection. As far as we are aware of, this is the first work that considers designing a tripartite machine learning system for domain-specific anomaly detection. |
topic |
Active learning interactive artificial intelligence anomaly detection linear integer programming human training |
url |
https://ieeexplore.ieee.org/document/8707963/ |
work_keys_str_mv |
AT yanqiaozhu tripartiteactivelearningforinteractiveanomalydiscovery AT kaiyang tripartiteactivelearningforinteractiveanomalydiscovery |
_version_ |
1724190548849328128 |