Local versus Global Models for Just-In-Time Software Defect Prediction

Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it ca...

Full description

Bibliographic Details
Main Authors: Xingguang Yang, Huiqun Yu, Guisheng Fan, Kai Shi, Liqiong Chen
Format: Article
Language:English
Published: Hindawi Limited 2019-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2019/2384706
id doaj-26bbabbd2cb9478fa48369145564d9d4
record_format Article
spelling doaj-26bbabbd2cb9478fa48369145564d9d42021-07-02T14:15:53ZengHindawi LimitedScientific Programming1058-92441875-919X2019-01-01201910.1155/2019/23847062384706Local versus Global Models for Just-In-Time Software Defect PredictionXingguang Yang0Huiqun Yu1Guisheng Fan2Kai Shi3Liqiong Chen4Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, ChinaJust-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.http://dx.doi.org/10.1155/2019/2384706
collection DOAJ
language English
format Article
sources DOAJ
author Xingguang Yang
Huiqun Yu
Guisheng Fan
Kai Shi
Liqiong Chen
spellingShingle Xingguang Yang
Huiqun Yu
Guisheng Fan
Kai Shi
Liqiong Chen
Local versus Global Models for Just-In-Time Software Defect Prediction
Scientific Programming
author_facet Xingguang Yang
Huiqun Yu
Guisheng Fan
Kai Shi
Liqiong Chen
author_sort Xingguang Yang
title Local versus Global Models for Just-In-Time Software Defect Prediction
title_short Local versus Global Models for Just-In-Time Software Defect Prediction
title_full Local versus Global Models for Just-In-Time Software Defect Prediction
title_fullStr Local versus Global Models for Just-In-Time Software Defect Prediction
title_full_unstemmed Local versus Global Models for Just-In-Time Software Defect Prediction
title_sort local versus global models for just-in-time software defect prediction
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2019-01-01
description Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.
url http://dx.doi.org/10.1155/2019/2384706
work_keys_str_mv AT xingguangyang localversusglobalmodelsforjustintimesoftwaredefectprediction
AT huiqunyu localversusglobalmodelsforjustintimesoftwaredefectprediction
AT guishengfan localversusglobalmodelsforjustintimesoftwaredefectprediction
AT kaishi localversusglobalmodelsforjustintimesoftwaredefectprediction
AT liqiongchen localversusglobalmodelsforjustintimesoftwaredefectprediction
_version_ 1721328140768247808