Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies

Assessing the quality of the software is both important and difficult. For this purpose, software fault prediction (SFP) models have been extensively used. However, selecting the right model and declaring the best out of multiple models are dependent on the performance measures. We analyze 14 freque...

Full description

Bibliographic Details
Main Authors: Muhammad Rizwan, Aamer Nadeem, Muddassar Azam Sindhu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8741051/
id doaj-58121f0b8f464fd69da701b4fca8a2c6
record_format Article
spelling doaj-58121f0b8f464fd69da701b4fca8a2c62021-03-30T00:16:04ZengIEEEIEEE Access2169-35362019-01-017827648277510.1109/ACCESS.2019.29238218741051Analyses of Classifier’s Performance Measures Used in Software Fault Prediction StudiesMuhammad Rizwan0https://orcid.org/0000-0002-0855-3465Aamer Nadeem1Muddassar Azam Sindhu2https://orcid.org/0000-0002-3411-9224Department of Computer Science, Capital University of Science and Technology (CUST), Islamabad, PakistanDepartment of Computer Science, Capital University of Science and Technology (CUST), Islamabad, PakistanDepartment of Computer Science, Quaid-i-Azam University (QAU), Islamabad, PakistanAssessing the quality of the software is both important and difficult. For this purpose, software fault prediction (SFP) models have been extensively used. However, selecting the right model and declaring the best out of multiple models are dependent on the performance measures. We analyze 14 frequently used, non-graphic classifier's performance measures used in SFP studies. These analyses would help machine learning practitioners and researchers in SFP to select the most appropriate performance measure for the models' evaluation. We analyze the performance measures for resilience against producing invalid values through our proposed plausibility criterion. After that, consistency and discriminancy analyses are performed to find the best out of the 14 performance measures. Finally, we draw the order of the selected performance measures from better to worse in both balance and imbalance datasets. Our analyses conclude that the F-measure and the G-mean1 are equally the best candidates to evaluate the SFP models with careful analysis of the result, as there is a risk of invalid values in certain scenarios.https://ieeexplore.ieee.org/document/8741051/Classificationevaluation parametersmachine learningperformance measuressoftware fault prediction
collection DOAJ
language English
format Article
sources DOAJ
author Muhammad Rizwan
Aamer Nadeem
Muddassar Azam Sindhu
spellingShingle Muhammad Rizwan
Aamer Nadeem
Muddassar Azam Sindhu
Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
IEEE Access
Classification
evaluation parameters
machine learning
performance measures
software fault prediction
author_facet Muhammad Rizwan
Aamer Nadeem
Muddassar Azam Sindhu
author_sort Muhammad Rizwan
title Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
title_short Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
title_full Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
title_fullStr Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
title_full_unstemmed Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
title_sort analyses of classifier’s performance measures used in software fault prediction studies
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Assessing the quality of the software is both important and difficult. For this purpose, software fault prediction (SFP) models have been extensively used. However, selecting the right model and declaring the best out of multiple models are dependent on the performance measures. We analyze 14 frequently used, non-graphic classifier's performance measures used in SFP studies. These analyses would help machine learning practitioners and researchers in SFP to select the most appropriate performance measure for the models' evaluation. We analyze the performance measures for resilience against producing invalid values through our proposed plausibility criterion. After that, consistency and discriminancy analyses are performed to find the best out of the 14 performance measures. Finally, we draw the order of the selected performance measures from better to worse in both balance and imbalance datasets. Our analyses conclude that the F-measure and the G-mean1 are equally the best candidates to evaluate the SFP models with careful analysis of the result, as there is a risk of invalid values in certain scenarios.
topic Classification
evaluation parameters
machine learning
performance measures
software fault prediction
url https://ieeexplore.ieee.org/document/8741051/
work_keys_str_mv AT muhammadrizwan analysesofclassifierx2019sperformancemeasuresusedinsoftwarefaultpredictionstudies
AT aamernadeem analysesofclassifierx2019sperformancemeasuresusedinsoftwarefaultpredictionstudies
AT muddassarazamsindhu analysesofclassifierx2019sperformancemeasuresusedinsoftwarefaultpredictionstudies
_version_ 1724188440609685504