Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies
As software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often con...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-09-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/9/18/3663 |
id |
doaj-535fbd489c0049cc8e13424c9a9b6684 |
---|---|
record_format |
Article |
spelling |
doaj-535fbd489c0049cc8e13424c9a9b66842020-11-24T20:52:50ZengMDPI AGApplied Sciences2076-34172019-09-01918366310.3390/app9183663app9183663Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning StrategiesShikai Guo0Miaomiao Wei1Siwen Wang2Rong Chen3Chen Guo4Hui Li5Tingting Li6The College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaThe College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaThe College of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, ChinaAs software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often concentrate on the bugs that have large impacts. However, there are two main challenges limiting the automation technology that would help developers to become aware of high-impact bug reports early, namely, low quality and class distribution imbalance. To address these two challenges, we propose an approach to identify high-impact bug reports that combines the data reduction and imbalanced learning strategies. In the data reduction phase, we combine feature selection with the instance selection method to build a small-scale and high-quality set of bug reports by removing the bug reports and words that are redundant or noninformative; in the imbalanced learning strategies phase, we handle the imbalanced distributions of bug reports through four imbalanced learning strategies. We experimentally verified that the method of combining the data reduction and imbalanced learning strategies could effectively identify high-impact bug reports.https://www.mdpi.com/2076-3417/9/18/3663high-impact bug reportsclass imbalancefeature selectioninstance selection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shikai Guo Miaomiao Wei Siwen Wang Rong Chen Chen Guo Hui Li Tingting Li |
spellingShingle |
Shikai Guo Miaomiao Wei Siwen Wang Rong Chen Chen Guo Hui Li Tingting Li Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies Applied Sciences high-impact bug reports class imbalance feature selection instance selection |
author_facet |
Shikai Guo Miaomiao Wei Siwen Wang Rong Chen Chen Guo Hui Li Tingting Li |
author_sort |
Shikai Guo |
title |
Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies |
title_short |
Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies |
title_full |
Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies |
title_fullStr |
Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies |
title_full_unstemmed |
Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies |
title_sort |
identify high-impact bug reports by combining the data reduction and imbalanced learning strategies |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2019-09-01 |
description |
As software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough time to inspect all the bugs. Thus, they often concentrate on the bugs that have large impacts. However, there are two main challenges limiting the automation technology that would help developers to become aware of high-impact bug reports early, namely, low quality and class distribution imbalance. To address these two challenges, we propose an approach to identify high-impact bug reports that combines the data reduction and imbalanced learning strategies. In the data reduction phase, we combine feature selection with the instance selection method to build a small-scale and high-quality set of bug reports by removing the bug reports and words that are redundant or noninformative; in the imbalanced learning strategies phase, we handle the imbalanced distributions of bug reports through four imbalanced learning strategies. We experimentally verified that the method of combining the data reduction and imbalanced learning strategies could effectively identify high-impact bug reports. |
topic |
high-impact bug reports class imbalance feature selection instance selection |
url |
https://www.mdpi.com/2076-3417/9/18/3663 |
work_keys_str_mv |
AT shikaiguo identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies AT miaomiaowei identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies AT siwenwang identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies AT rongchen identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies AT chenguo identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies AT huili identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies AT tingtingli identifyhighimpactbugreportsbycombiningthedatareductionandimbalancedlearningstrategies |
_version_ |
1716798822535921664 |