Applying Machine Learning to Software Fault Prediction
Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentratin...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wroclaw University of Science and Technology
2018-05-01
|
Series: | e-Informatica Software Engineering Journal |
Subjects: | |
Online Access: | http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdf |
id |
doaj-bb1df804a96c4da0911d7aefdb8e9e56 |
---|---|
record_format |
Article |
spelling |
doaj-bb1df804a96c4da0911d7aefdb8e9e562020-11-25T01:07:20ZengWroclaw University of Science and Technologye-Informatica Software Engineering Journal1897-79792084-48402018-05-0112119921610.5277/e-Inf180108Applying Machine Learning to Software Fault PredictionBartłomiej Wójcicki0Robert Dąbrowski1Institute of Informatics, University of WarsawInstitute of Informatics, University of WarsawIntroduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets. http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdfclassifierfault predictionmachine learningmetricNaïve BayesPythonquality |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Bartłomiej Wójcicki Robert Dąbrowski |
spellingShingle |
Bartłomiej Wójcicki Robert Dąbrowski Applying Machine Learning to Software Fault Prediction e-Informatica Software Engineering Journal classifier fault prediction machine learning metric Naïve Bayes Python quality |
author_facet |
Bartłomiej Wójcicki Robert Dąbrowski |
author_sort |
Bartłomiej Wójcicki |
title |
Applying Machine Learning to Software Fault Prediction |
title_short |
Applying Machine Learning to Software Fault Prediction |
title_full |
Applying Machine Learning to Software Fault Prediction |
title_fullStr |
Applying Machine Learning to Software Fault Prediction |
title_full_unstemmed |
Applying Machine Learning to Software Fault Prediction |
title_sort |
applying machine learning to software fault prediction |
publisher |
Wroclaw University of Science and Technology |
series |
e-Informatica Software Engineering Journal |
issn |
1897-7979 2084-4840 |
publishDate |
2018-05-01 |
description |
Introduction: Software engineering continuously suffers from inadequate software testing. The automated prediction of possibly faulty fragments of source code allows developers to focus development efforts on fault-prone fragments first. Fault prediction has been a topic of many studies concentrating on C/C++ and Java programs, with little focus on such programming languages as Python. Objectives: In this study the authors want to verify whether the type of approach used in former fault prediction studies can be applied to Python. More precisely, the primary objective is conducting preliminary research using simple methods that would support (or contradict) the expectation that predicting faults in Python programs is also feasible. The secondary objective is establishing grounds for more thorough future research and publications, provided promising results are obtained during the preliminary research. Methods: It has been demonstrated that using machine learning techniques, it is possible to predict faults for C/C++ and Java projects with recall 0.71 and false positive rate 0.25. A similar approach was applied in order to find out if promising results can be obtained for Python projects. The working hypothesis is that choosing Python as a programming language does not significantly alter those results. A preliminary study is conducted and a basic machine learning technique is applied to a few sample Python projects. If these efforts succeed, it will indicate that the selected approach is worth pursuing as it is possible to obtain for Python results similar to the ones obtained for C/C++ and Java. However, if these efforts fail, it will indicate that the selected approach was not appropriate for the selected group of Python projects. Results: The research demonstrates experimental evidence that fault-prediction methods similar to those developed for C/C++ and Java programs can be successfully applied to Python programs, achieving recall up to 0.64 with false positive rate 0.23 (mean recall 0.53 with false positive rate 0.24). This indicates that more thorough research in this area is worth conducting. Conclusion: Having obtained promising results using this simple approach, the authors conclude that the research on predicting faults in Python programs using machine learning techniques is worth conducting, natural ways to enhance the future research being: using more sophisticated machine learning techniques, using additional Python-specific features and extended data sets. |
topic |
classifier fault prediction machine learning metric Naïve Bayes Python quality |
url |
http://www.e-informatyka.pl/attach/e-Informatica_-_Volume_12/eInformatica2018Art8.pdf |
work_keys_str_mv |
AT bartłomiejwojcicki applyingmachinelearningtosoftwarefaultprediction AT robertdabrowski applyingmachinelearningtosoftwarefaultprediction |
_version_ |
1725187688245493760 |