Correlation feature and instance weights transfer learning for cross project software defect prediction

Abstract Due to the differentiation between training and testing data in the feature space, cross‐project defect prediction (CPDP) remains unaddressed within the field of traditional machine learning. Recently, transfer learning has become a research hot‐spot for building classifiers in the target d...

Full description

Bibliographic Details
Main Authors: Quanyi Zou, Lu Lu, Shaojian Qiu, Xiaowei Gu, Ziyi Cai
Format: Article
Language:English
Published: Wiley 2021-02-01
Series:IET Software
Online Access:https://doi.org/10.1049/sfw2.12012
id doaj-950fb16496b84573ae3aeaac4bdaf4a4
record_format Article
spelling doaj-950fb16496b84573ae3aeaac4bdaf4a42021-08-02T08:25:07ZengWileyIET Software1751-88061751-88142021-02-01151557410.1049/sfw2.12012Correlation feature and instance weights transfer learning for cross project software defect predictionQuanyi Zou0Lu Lu1Shaojian Qiu2Xiaowei Gu3Ziyi Cai4The School of Software Engineering South China University of Technology Guangzhou ChinaThe School of Computer Science and Engineering South China University of Technology Guangzhou ChinaThe College of Mathematics and Informatics South China Agricultural University Guangzhou ChinaThe School of Software Engineering South China University of Technology Guangzhou ChinaThe School of Computer Science and Engineering South China University of Technology Guangzhou ChinaAbstract Due to the differentiation between training and testing data in the feature space, cross‐project defect prediction (CPDP) remains unaddressed within the field of traditional machine learning. Recently, transfer learning has become a research hot‐spot for building classifiers in the target domain using the data from the related source domains. To implement better CPDP models, recent studies focus on either feature transferring or instance transferring to weaken the impact of irrelevant cross‐project data. Instead, this work proposes a dual weighting mechanism to aid the learning process, considering both feature transferring and instance transferring. In our method, a local data gravitation between source and target domains determines instance weight, while features that are highly correlated with the learning task, uncorrelated with other features and minimizing the difference between the domains are rewarded with a higher feature weight. Experiments on 25 real‐world datasets indicate that the proposed approach outperforms the existing CPDP methods in most cases. By assigning weights based on the different contribution of features and instances to the predictor, the proposed approach is able to build a better CPDP model and demonstrates substantial improvements over the state‐of‐the‐art CPDP models.https://doi.org/10.1049/sfw2.12012
collection DOAJ
language English
format Article
sources DOAJ
author Quanyi Zou
Lu Lu
Shaojian Qiu
Xiaowei Gu
Ziyi Cai
spellingShingle Quanyi Zou
Lu Lu
Shaojian Qiu
Xiaowei Gu
Ziyi Cai
Correlation feature and instance weights transfer learning for cross project software defect prediction
IET Software
author_facet Quanyi Zou
Lu Lu
Shaojian Qiu
Xiaowei Gu
Ziyi Cai
author_sort Quanyi Zou
title Correlation feature and instance weights transfer learning for cross project software defect prediction
title_short Correlation feature and instance weights transfer learning for cross project software defect prediction
title_full Correlation feature and instance weights transfer learning for cross project software defect prediction
title_fullStr Correlation feature and instance weights transfer learning for cross project software defect prediction
title_full_unstemmed Correlation feature and instance weights transfer learning for cross project software defect prediction
title_sort correlation feature and instance weights transfer learning for cross project software defect prediction
publisher Wiley
series IET Software
issn 1751-8806
1751-8814
publishDate 2021-02-01
description Abstract Due to the differentiation between training and testing data in the feature space, cross‐project defect prediction (CPDP) remains unaddressed within the field of traditional machine learning. Recently, transfer learning has become a research hot‐spot for building classifiers in the target domain using the data from the related source domains. To implement better CPDP models, recent studies focus on either feature transferring or instance transferring to weaken the impact of irrelevant cross‐project data. Instead, this work proposes a dual weighting mechanism to aid the learning process, considering both feature transferring and instance transferring. In our method, a local data gravitation between source and target domains determines instance weight, while features that are highly correlated with the learning task, uncorrelated with other features and minimizing the difference between the domains are rewarded with a higher feature weight. Experiments on 25 real‐world datasets indicate that the proposed approach outperforms the existing CPDP methods in most cases. By assigning weights based on the different contribution of features and instances to the predictor, the proposed approach is able to build a better CPDP model and demonstrates substantial improvements over the state‐of‐the‐art CPDP models.
url https://doi.org/10.1049/sfw2.12012
work_keys_str_mv AT quanyizou correlationfeatureandinstanceweightstransferlearningforcrossprojectsoftwaredefectprediction
AT lulu correlationfeatureandinstanceweightstransferlearningforcrossprojectsoftwaredefectprediction
AT shaojianqiu correlationfeatureandinstanceweightstransferlearningforcrossprojectsoftwaredefectprediction
AT xiaoweigu correlationfeatureandinstanceweightstransferlearningforcrossprojectsoftwaredefectprediction
AT ziyicai correlationfeatureandinstanceweightstransferlearningforcrossprojectsoftwaredefectprediction
_version_ 1721238406586957824