Applying Machine Learning to Software Fault Prediction

Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentratin...

Full description

Bibliographic Details
Main Authors:	Bartłomiej Wójcicki, Robert Dąbrowski
Format:	Article
Language:	English
Published:	Wroclaw University of Science and Technology 2018-05-01
Series:	e-Informatica Software Engineering Journal
Subjects:	classifier fault prediction machine learning metric Naïve Bayes Python quality
Online Access:	http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdf

id	doaj-bb1df804a96c4da0911d7aefdb8e9e56
record_format	Article
spelling	doaj-bb1df804a96c4da0911d7aefdb8e9e562020-11-25T01:07:20ZengWroclaw University of Science and Technologye-Informatica Software Engineering Journal1897-79792084-48402018-05-0112119921610.5277/e-Inf180108Applying Machine Learning to Software Fault PredictionBartłomiej Wójcicki0Robert Dąbrowski1Institute of Informatics, University of WarsawInstitute of Informatics, University of WarsawIntroduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets. http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdfclassifierfault predictionmachine learningmetricNaïve BayesPythonquality
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Bartłomiej Wójcicki Robert Dąbrowski
spellingShingle	Bartłomiej Wójcicki Robert Dąbrowski Applying Machine Learning to Software Fault Prediction e-Informatica Software Engineering Journal classifier fault prediction machine learning metric Naïve Bayes Python quality
author_facet	Bartłomiej Wójcicki Robert Dąbrowski
author_sort	Bartłomiej Wójcicki
title	Applying Machine Learning to Software Fault Prediction
title_short	Applying Machine Learning to Software Fault Prediction
title_full	Applying Machine Learning to Software Fault Prediction
title_fullStr	Applying Machine Learning to Software Fault Prediction
title_full_unstemmed	Applying Machine Learning to Software Fault Prediction
title_sort	applying machine learning to software fault prediction
publisher	Wroclaw University of Science and Technology
series	e-Informatica Software Engineering Journal
issn	1897-7979 2084-4840
publishDate	2018-05-01
description	Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets.
topic	classifier fault prediction machine learning metric Naïve Bayes Python quality
url	http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdf
work_keys_str_mv	AT bartłomiejwojcicki applyingmachinelearningtosoftwarefaultprediction AT robertdabrowski applyingmachinelearningtosoftwarefaultprediction
_version_	1725187688245493760

Applying Machine Learning to Software Fault Prediction

Similar Items