Leveraging Defects Life-Cycle for Labeling Defective Classes

Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis ac...

Full description

Bibliographic Details
Main Author: Vandehei, Bailey R
Format: Others
Published: DigitalCommons@CalPoly 2019
Subjects:
Online Access:https://digitalcommons.calpoly.edu/theses/2111
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3565&context=theses
id ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-3565
record_format oai_dc
spelling ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-35652021-08-20T05:02:49Z Leveraging Defects Life-Cycle for Labeling Defective Classes Vandehei, Bailey R Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis activities. This work focuses on automated methods for labeling a class in a version as defective or not. The most used methods for automated class labeling belong to the SZZ family and fail in various circum- stances. Thus, recent studies suggest the use of aect version (AV) as provided by developers and available in the issue tracker such as JIRA. However, in many cir- cumstances, the AV might not be used because it is unavailable or inconsistent. The aim of this study is twofold: 1) to measure the AV availability and consistency in open-source projects, 2) to propose, evaluate, and compare to SZZ, a new method for labeling defective classes which is based on the idea that defects have a stable life-cycle in terms of proportion of versions needed to discover the defect and to x the defect. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, show that the AV cannot be used in the majority (51%) of defects. Therefore, it is important to investigate automated meth- ods for labeling defective classes. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes that are are aected by 60,000 defects and spread over 4,000 versions and 760,000 commits, show that the proposed method for labeling defective classes is, in average among projects and de- fects, more accurate, in terms of Precision, Kappa, F1 and MCC than all previously proposed SZZ methods. Moreover, the improvement in accuracy from combining SZZ with defects life-cycle information is statistically signicant but practically irrelevant ( overall and in average, more accurate via defects' life-cycle than any SZZ method. 2019-12-01T08:00:00Z text application/pdf https://digitalcommons.calpoly.edu/theses/2111 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3565&context=theses Master's Theses DigitalCommons@CalPoly affect version defect prediction dataset Software Engineering
collection NDLTD
format Others
sources NDLTD
topic affect version
defect prediction
dataset
Software Engineering
spellingShingle affect version
defect prediction
dataset
Software Engineering
Vandehei, Bailey R
Leveraging Defects Life-Cycle for Labeling Defective Classes
description Data from software repositories are a very useful asset to building dierent kinds of models and recommender systems aimed to support software developers. Specically, the identication of likely defect-prone les (i.e., classes in Object-Oriented systems) helps in prioritizing, testing, and analysis activities. This work focuses on automated methods for labeling a class in a version as defective or not. The most used methods for automated class labeling belong to the SZZ family and fail in various circum- stances. Thus, recent studies suggest the use of aect version (AV) as provided by developers and available in the issue tracker such as JIRA. However, in many cir- cumstances, the AV might not be used because it is unavailable or inconsistent. The aim of this study is twofold: 1) to measure the AV availability and consistency in open-source projects, 2) to propose, evaluate, and compare to SZZ, a new method for labeling defective classes which is based on the idea that defects have a stable life-cycle in terms of proportion of versions needed to discover the defect and to x the defect. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, show that the AV cannot be used in the majority (51%) of defects. Therefore, it is important to investigate automated meth- ods for labeling defective classes. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes that are are aected by 60,000 defects and spread over 4,000 versions and 760,000 commits, show that the proposed method for labeling defective classes is, in average among projects and de- fects, more accurate, in terms of Precision, Kappa, F1 and MCC than all previously proposed SZZ methods. Moreover, the improvement in accuracy from combining SZZ with defects life-cycle information is statistically signicant but practically irrelevant ( overall and in average, more accurate via defects' life-cycle than any SZZ method.
author Vandehei, Bailey R
author_facet Vandehei, Bailey R
author_sort Vandehei, Bailey R
title Leveraging Defects Life-Cycle for Labeling Defective Classes
title_short Leveraging Defects Life-Cycle for Labeling Defective Classes
title_full Leveraging Defects Life-Cycle for Labeling Defective Classes
title_fullStr Leveraging Defects Life-Cycle for Labeling Defective Classes
title_full_unstemmed Leveraging Defects Life-Cycle for Labeling Defective Classes
title_sort leveraging defects life-cycle for labeling defective classes
publisher DigitalCommons@CalPoly
publishDate 2019
url https://digitalcommons.calpoly.edu/theses/2111
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3565&context=theses
work_keys_str_mv AT vandeheibaileyr leveragingdefectslifecycleforlabelingdefectiveclasses
_version_ 1719460520835153920