Applying Machine Learning to Software Fault Prediction

Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentratin...

Full description

Bibliographic Details
Main Authors: Bartłomiej Wójcicki, Robert Dąbrowski
Format: Article
Language:English
Published: Wroclaw University of Science and Technology 2018-05-01
Series:e-Informatica Software Engineering Journal
Subjects:
Online Access:http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdf
id doaj-bb1df804a96c4da0911d7aefdb8e9e56
record_format Article
spelling doaj-bb1df804a96c4da0911d7aefdb8e9e562020-11-25T01:07:20ZengWroclaw University of Science and Technologye-Informatica Software Engineering Journal1897-79792084-48402018-05-0112119921610.5277/e-Inf180108Applying Machine Learning to Software Fault PredictionBartłomiej Wójcicki0Robert Dąbrowski1Institute of Informatics, University of WarsawInstitute of Informatics, University of WarsawIntroduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets. http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdfclassifierfault predictionmachine learningmetricNaïve BayesPythonquality
collection DOAJ
language English
format Article
sources DOAJ
author Bartłomiej Wójcicki
Robert Dąbrowski
spellingShingle Bartłomiej Wójcicki
Robert Dąbrowski
Applying Machine Learning to Software Fault Prediction
e-Informatica Software Engineering Journal
classifier
fault prediction
machine learning
metric
Naïve Bayes
Python
quality
author_facet Bartłomiej Wójcicki
Robert Dąbrowski
author_sort Bartłomiej Wójcicki
title Applying Machine Learning to Software Fault Prediction
title_short Applying Machine Learning to Software Fault Prediction
title_full Applying Machine Learning to Software Fault Prediction
title_fullStr Applying Machine Learning to Software Fault Prediction
title_full_unstemmed Applying Machine Learning to Software Fault Prediction
title_sort applying machine learning to software fault prediction
publisher Wroclaw University of Science and Technology
series e-Informatica Software Engineering Journal
issn 1897-7979
2084-4840
publishDate 2018-05-01
description Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets.
topic classifier
fault prediction
machine learning
metric
Naïve Bayes
Python
quality
url http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdf
work_keys_str_mv AT bartłomiejwojcicki applyingmachinelearningtosoftwarefaultprediction
AT robertdabrowski applyingmachinelearningtosoftwarefaultprediction
_version_ 1725187688245493760