Summary: | Existing semi-supervised anomaly detection methods usually use a large amount of labeled normal data for training, which have the problem of high labeling costs. Only a few semi-supervised methods utilize unlabeled data and a few labeled anomalies to train models. However, these kinds of methods usually encounter two problems: (i) since anomalies usually have different behavior patterns or the internal mechanisms that produce anomalies are complex and diverse, a few labeled anomalies cannot cover all anomaly types; and (ii) the amount of unlabeled data in the training set is substantially greater than the amount of labeled data, which leads to that unlabeled data with contamination often dominates the training process. To solve these two problems, we propose the semi-supervised anomaly detection method named ConNet and a new loss function named concentration loss. Specifically, ConNet consists of two stages. Firstly, we obtain the prior anomaly score of unlabeled data via prior estimation module and attach the prior anomaly score to unlabeled data as the training weight. Then, an anomaly scoring network is training to assign anomaly scores to data instances, which can ensure that the anomaly scores of anomalies significantly deviate from those of normal instances. We have conducted experiments on thirteen real-world data sets and tested the performance of our method in terms of detection accuracy, utilization efficiency of labeled data, and robustness to different contamination rates. The experimental results show that the performance of our method is significantly better than those of the state-of-the-art anomaly detection methods.
|