Leveraging Defects Life-Cycle for Labeling Defective Classes
Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis ac...
Main Author: | |
---|---|
Format: | Others |
Published: |
DigitalCommons@CalPoly
2019
|
Subjects: | |
Online Access: | https://digitalcommons.calpoly.edu/theses/2111 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3565&context=theses |
id |
ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-3565 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-35652021-08-20T05:02:49Z Leveraging Defects Life-Cycle for Labeling Defective Classes Vandehei, Bailey R Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis activities. This work focuses on automated methods for labeling a class in a version as defective or not. The most used methods for automated class labeling belong to the SZZ family and fail in various circum- stances. Thus, recent studies suggest the use of aect version (AV) as provided by developers and available in the issue tracker such as JIRA. However, in many cir- cumstances, the AV might not be used because it is unavailable or inconsistent. The aim of this study is twofold: 1) to measure the AV availability and consistency in open-source projects, 2) to propose, evaluate, and compare to SZZ, a new method for labeling defective classes which is based on the idea that defects have a stable life-cycle in terms of proportion of versions needed to discover the defect and to x the defect. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, show that the AV cannot be used in the majority (51%) of defects. Therefore, it is important to investigate automated meth- ods for labeling defective classes. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes that are are aected by 60,000 defects and spread over 4,000 versions and 760,000 commits, show that the proposed method for labeling defective classes is, in average among projects and de- fects, more accurate, in terms of Precision, Kappa, F1 and MCC than all previously proposed SZZ methods. Moreover, the improvement in accuracy from combining SZZ with defects life-cycle information is statistically signicant but practically irrelevant ( overall and in average, more accurate via defects' life-cycle than any SZZ method. 2019-12-01T08:00:00Z text application/pdf https://digitalcommons.calpoly.edu/theses/2111 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3565&context=theses Master's Theses DigitalCommons@CalPoly affect version defect prediction dataset Software Engineering |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
affect version defect prediction dataset Software Engineering |
spellingShingle |
affect version defect prediction dataset Software Engineering Vandehei, Bailey R Leveraging Defects Life-Cycle for Labeling Defective Classes |
description |
Data from software repositories are a very useful asset to building dierent kinds of
models and recommender systems aimed to support software developers. Specically,
the identication of likely defect-prone les (i.e., classes in Object-Oriented systems)
helps in prioritizing, testing, and analysis activities. This work focuses on automated
methods for labeling a class in a version as defective or not. The most used methods
for automated class labeling belong to the SZZ family and fail in various circum-
stances. Thus, recent studies suggest the use of aect version (AV) as provided by
developers and available in the issue tracker such as JIRA. However, in many cir-
cumstances, the AV might not be used because it is unavailable or inconsistent. The
aim of this study is twofold: 1) to measure the AV availability and consistency in
open-source projects, 2) to propose, evaluate, and compare to SZZ, a new method
for labeling defective classes which is based on the idea that defects have a stable
life-cycle in terms of proportion of versions needed to discover the defect and to x
the defect. Results related to 212 open-source projects from the Apache ecosystem,
featuring a total of about 125,000 defects, show that the AV cannot be used in the
majority (51%) of defects. Therefore, it is important to investigate automated meth-
ods for labeling defective classes. Results related to 76 open-source projects from the
Apache ecosystem, featuring a total of about 6,250,000 classes that are are aected
by 60,000 defects and spread over 4,000 versions and 760,000 commits, show that the
proposed method for labeling defective classes is, in average among projects and de-
fects, more accurate, in terms of Precision, Kappa, F1 and MCC than all previously
proposed SZZ methods. Moreover, the improvement in accuracy from combining SZZ
with defects life-cycle information is statistically signicant but practically irrelevant
(
overall and in average, more accurate via defects' life-cycle than any SZZ method. |
author |
Vandehei, Bailey R |
author_facet |
Vandehei, Bailey R |
author_sort |
Vandehei, Bailey R |
title |
Leveraging Defects Life-Cycle for Labeling Defective Classes |
title_short |
Leveraging Defects Life-Cycle for Labeling Defective Classes |
title_full |
Leveraging Defects Life-Cycle for Labeling Defective Classes |
title_fullStr |
Leveraging Defects Life-Cycle for Labeling Defective Classes |
title_full_unstemmed |
Leveraging Defects Life-Cycle for Labeling Defective Classes |
title_sort |
leveraging defects life-cycle for labeling defective classes |
publisher |
DigitalCommons@CalPoly |
publishDate |
2019 |
url |
https://digitalcommons.calpoly.edu/theses/2111 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3565&context=theses |
work_keys_str_mv |
AT vandeheibaileyr leveragingdefectslifecycleforlabelingdefectiveclasses |
_version_ |
1719460520835153920 |