Few-Shot Learning Based Balanced Distribution Adaptation for Heterogeneous Defect Prediction

Heterogeneous defect prediction (HDP) aims to predict the defect tendency of modules in one project using heterogeneous data collected from other projects. It sufficiently incorporates the two characteristics of the defect prediction data: (1) datasets could have different metrics and distribution,...

Full description

Bibliographic Details
Main Authors: Aili Wang, Yutong Zhang, Haibin Wu, Kaiyuan Jiang, Minhui Wang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8999527/
Description
Summary:Heterogeneous defect prediction (HDP) aims to predict the defect tendency of modules in one project using heterogeneous data collected from other projects. It sufficiently incorporates the two characteristics of the defect prediction data: (1) datasets could have different metrics and distribution, and (2) data could be highly imbalanced. In this paper, we propose a few-shot learning based balanced distribution adaptation (FSLBDA) approach for heterogeneous defect prediction, which takes into consideration the two characteristics of the defect prediction data. Class imbalance of the defect datasets can be solved with undersampling, but the scale of the training datasets will be smaller. Specifically, we first remove redundant metrics of datasets with extreme gradient boosting. Then, we reduce the data difference between the source domain and the target domain with the balanced distribution adaptation. It considers the marginal distribution and the probability of conditional distribution differences and adaptively assigns different weights to them. Finally, we use adaptive boosting to relieve the influence caused by the size of the training dataset is smaller, which can improve the accuracy of the defect prediction model. We conduct experiments on 17 projects from 4 datasets using 3 indicators (i.e., AUC, G-mean, F-measure). Compared to three classic approaches, the experimental results show that FSLBDA can effectively improve the prediction performance.
ISSN:2169-3536