Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies

As software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often con...

Full description

Bibliographic Details
Main Authors: Shikai Guo, Miaomiao Wei, Siwen Wang, Rong Chen, Chen Guo, Hui Li, Tingting Li
Format: Article
Language:English
Published: MDPI AG 2019-09-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/9/18/3663
id doaj-535fbd489c0049cc8e13424c9a9b6684
record_format Article
spelling doaj-535fbd489c0049cc8e13424c9a9b66842020-11-24T20:52:50ZengMDPI AGApplied Sciences2076-34172019-09-01918366310.3390/app9183663app9183663Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning StrategiesShikai Guo0Miaomiao Wei1Siwen Wang2Rong Chen3Chen Guo4Hui Li5Tingting Li6The College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaThe College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, ChinaAs software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often concentrate on the bugs that have large impacts. However, there are two main challenges limiting the automation technology that would help developers to become aware of high-impact bug reports early, namely, low quality and class distribution imbalance. To address these two challenges, we propose an approach to identify high-impact bug reports that combines the data reduction and imbalanced learning strategies. In the data reduction phase, we combine feature selection with the instance selection method to build a small-scale and high-quality set of bug reports by removing the bug reports and words that are redundant or noninformative; in the imbalanced learning strategies phase, we handle the imbalanced distributions of bug reports through four imbalanced learning strategies. We experimentally verified that the method of combining the data reduction and imbalanced learning strategies could effectively identify high-impact bug reports.https://www.mdpi.com/2076-3417/9/18/3663high-impact bug reportsclass imbalancefeature selectioninstance selection
collection DOAJ
language English
format Article
sources DOAJ
author Shikai Guo
Miaomiao Wei
Siwen Wang
Rong Chen
Chen Guo
Hui Li
Tingting Li
spellingShingle Shikai Guo
Miaomiao Wei
Siwen Wang
Rong Chen
Chen Guo
Hui Li
Tingting Li
Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
Applied Sciences
high-impact bug reports
class imbalance
feature selection
instance selection
author_facet Shikai Guo
Miaomiao Wei
Siwen Wang
Rong Chen
Chen Guo
Hui Li
Tingting Li
author_sort Shikai Guo
title Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
title_short Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
title_full Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
title_fullStr Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
title_full_unstemmed Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
title_sort identify high-impact bug reports by combining the data reduction and imbalanced learning strategies
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2019-09-01
description As software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often concentrate on the bugs that have large impacts. However, there are two main challenges limiting the automation technology that would help developers to become aware of high-impact bug reports early, namely, low quality and class distribution imbalance. To address these two challenges, we propose an approach to identify high-impact bug reports that combines the data reduction and imbalanced learning strategies. In the data reduction phase, we combine feature selection with the instance selection method to build a small-scale and high-quality set of bug reports by removing the bug reports and words that are redundant or noninformative; in the imbalanced learning strategies phase, we handle the imbalanced distributions of bug reports through four imbalanced learning strategies. We experimentally verified that the method of combining the data reduction and imbalanced learning strategies could effectively identify high-impact bug reports.
topic high-impact bug reports
class imbalance
feature selection
instance selection
url https://www.mdpi.com/2076-3417/9/18/3663
work_keys_str_mv AT shikaiguo identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
AT miaomiaowei identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
AT siwenwang identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
AT rongchen identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
AT chenguo identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
AT huili identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
AT tingtingli identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies
_version_ 1716798822535921664