Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes

Supervised learning is a machine learning technique used for creating a data prediction model. This article focuses on finding high performance supervised learning algorithms with varied training data sizes, varied number of attributes, and time spent on prediction. This studied evaluated seven algo...

Full description

Bibliographic Details
Main Authors:	Chaluemwut Noyunsan, Tatpong Katanyukul, Kanda Saikaew
Format:	Article
Language:	English
Published:	Khon Kaen University 2018-09-01
Series:	Engineering and Applied Science Research
Subjects:	Supervised learning algorithms Evaluation metrics Performance comparison
Online Access:	https://www.tci-thaijo.org/index.php/easr/article/download/88019/107554/

id	doaj-75918f5ecb2642959d406e59051d1785
record_format	Article
spelling	doaj-75918f5ecb2642959d406e59051d17852020-11-24T22:19:46ZengKhon Kaen UniversityEngineering and Applied Science Research2539-61612539-62182018-09-0145322122910.14456/easr.2018.28Performance evaluation of supervised learning algorithms with various training data sizes and missing attributesChaluemwut NoyunsanTatpong KatanyukulKanda SaikaewSupervised learning is a machine learning technique used for creating a data prediction model. This article focuses on finding high performance supervised learning algorithms with varied training data sizes, varied number of attributes, and time spent on prediction. This studied evaluated seven algorithms, Boosting, Random Forest, Bagging, Naive Bayes, K-Nearest Neighbours (K-NN), Decision Tree, and Support Vector Machine (SVM), on seven data sets that are the standard benchmark from University of California, Irvine (UCI) with two evaluation metrics and experimental settings of various training data sizes and missing key attributes. Our findings reveal that Bagging, Random Forest, and SVM are overall the three most accurate algorithms. However, when presence of key attribute values is of concern, K-NN is recommended as its performance is affected the least. Alternatively, when training data sizes may be not large enough, Naive Bayes is preferable since it is the most insensitive algorithm to training data sizes. The algorithms are characterized on a two-dimension chart based on prediction performance and computation time. This chart is expected to guide a novice user to choose an appropriate method for his/her demand. Based on this chart, in general, Bagging and Random Forest are the two most recommended algorithms because of their high performance and speed.https://www.tci-thaijo.org/index.php/easr/article/download/88019/107554/Supervised learning algorithmsEvaluation metricsPerformance comparison
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chaluemwut Noyunsan Tatpong Katanyukul Kanda Saikaew
spellingShingle	Chaluemwut Noyunsan Tatpong Katanyukul Kanda Saikaew Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes Engineering and Applied Science Research Supervised learning algorithms Evaluation metrics Performance comparison
author_facet	Chaluemwut Noyunsan Tatpong Katanyukul Kanda Saikaew
author_sort	Chaluemwut Noyunsan
title	Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
title_short	Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
title_full	Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
title_fullStr	Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
title_full_unstemmed	Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
title_sort	performance evaluation of supervised learning algorithms with various training data sizes and missing attributes
publisher	Khon Kaen University
series	Engineering and Applied Science Research
issn	2539-6161 2539-6218
publishDate	2018-09-01
description	Supervised learning is a machine learning technique used for creating a data prediction model. This article focuses on finding high performance supervised learning algorithms with varied training data sizes, varied number of attributes, and time spent on prediction. This studied evaluated seven algorithms, Boosting, Random Forest, Bagging, Naive Bayes, K-Nearest Neighbours (K-NN), Decision Tree, and Support Vector Machine (SVM), on seven data sets that are the standard benchmark from University of California, Irvine (UCI) with two evaluation metrics and experimental settings of various training data sizes and missing key attributes. Our findings reveal that Bagging, Random Forest, and SVM are overall the three most accurate algorithms. However, when presence of key attribute values is of concern, K-NN is recommended as its performance is affected the least. Alternatively, when training data sizes may be not large enough, Naive Bayes is preferable since it is the most insensitive algorithm to training data sizes. The algorithms are characterized on a two-dimension chart based on prediction performance and computation time. This chart is expected to guide a novice user to choose an appropriate method for his/her demand. Based on this chart, in general, Bagging and Random Forest are the two most recommended algorithms because of their high performance and speed.
topic	Supervised learning algorithms Evaluation metrics Performance comparison
url	https://www.tci-thaijo.org/index.php/easr/article/download/88019/107554/
work_keys_str_mv	AT chaluemwutnoyunsan performanceevaluationofsupervisedlearningalgorithmswithvarioustrainingdatasizesandmissingattributes AT tatpongkatanyukul performanceevaluationofsupervisedlearningalgorithmswithvarioustrainingdatasizesandmissingattributes AT kandasaikaew performanceevaluationofsupervisedlearningalgorithmswithvarioustrainingdatasizesandmissingattributes
_version_	1725777424914841600

Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes

Similar Items