A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.

碩士 === 國立成功大學 === 資訊管理研究所 === 102 === The performance of a classification algorithm is generally evaluated by K-fold cross validation to find the one that has the highest accuracy. Then the model induced from all available data by the best classification algorithm, called full sample model, is used...

Full description

Bibliographic Details
Main Authors: Chiao-YingLin, 林巧盈
Other Authors: Tzu-Tsung Wong
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/23699989925707105417
id ndltd-TW-102NCKU5396012
record_format oai_dc
spelling ndltd-TW-102NCKU53960122016-03-07T04:10:57Z http://ndltd.ncl.edu.tw/handle/23699989925707105417 A study on the selection error rate of classification algorithms evaluated by k-fold cross validation. 探討K等分交叉驗證法對於分類器錯選率之研究 Chiao-YingLin 林巧盈 碩士 國立成功大學 資訊管理研究所 102 The performance of a classification algorithm is generally evaluated by K-fold cross validation to find the one that has the highest accuracy. Then the model induced from all available data by the best classification algorithm, called full sample model, is used for prediction and interpretation. Since there are no extra data to evaluate the full sample model resulting from the best algorithm, its prediction accuracy can be less than the accuracy of the full sample model induced by the other classification algorithm, and this is called a selection error. This study designs an experiment to calculate and estimate the selection error rate, and attempts to propose a new model for reducing selection error rate. The classification algorithms considered in this study are decision tree, naïve Bayesian classifier, logistic regression, and support vector machine. The experimental results on 30 data sets show that the actual and estimated selection error rates can be greatly different in several cases. The new model that has the median accuracy can reduce the selection error rate without sacrificing the prediction accuracy. Tzu-Tsung Wong 翁慈宗 2014 學位論文 ; thesis 55 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊管理研究所 === 102 === The performance of a classification algorithm is generally evaluated by K-fold cross validation to find the one that has the highest accuracy. Then the model induced from all available data by the best classification algorithm, called full sample model, is used for prediction and interpretation. Since there are no extra data to evaluate the full sample model resulting from the best algorithm, its prediction accuracy can be less than the accuracy of the full sample model induced by the other classification algorithm, and this is called a selection error. This study designs an experiment to calculate and estimate the selection error rate, and attempts to propose a new model for reducing selection error rate. The classification algorithms considered in this study are decision tree, naïve Bayesian classifier, logistic regression, and support vector machine. The experimental results on 30 data sets show that the actual and estimated selection error rates can be greatly different in several cases. The new model that has the median accuracy can reduce the selection error rate without sacrificing the prediction accuracy.
author2 Tzu-Tsung Wong
author_facet Tzu-Tsung Wong
Chiao-YingLin
林巧盈
author Chiao-YingLin
林巧盈
spellingShingle Chiao-YingLin
林巧盈
A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
author_sort Chiao-YingLin
title A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
title_short A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
title_full A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
title_fullStr A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
title_full_unstemmed A study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
title_sort study on the selection error rate of classification algorithms evaluated by k-fold cross validation.
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/23699989925707105417
work_keys_str_mv AT chiaoyinglin astudyontheselectionerrorrateofclassificationalgorithmsevaluatedbykfoldcrossvalidation
AT línqiǎoyíng astudyontheselectionerrorrateofclassificationalgorithmsevaluatedbykfoldcrossvalidation
AT chiaoyinglin tàntǎokděngfēnjiāochāyànzhèngfǎduìyúfēnlèiqìcuòxuǎnlǜzhīyánjiū
AT línqiǎoyíng tàntǎokděngfēnjiāochāyànzhèngfǎduìyúfēnlèiqìcuòxuǎnlǜzhīyánjiū
AT chiaoyinglin studyontheselectionerrorrateofclassificationalgorithmsevaluatedbykfoldcrossvalidation
AT línqiǎoyíng studyontheselectionerrorrateofclassificationalgorithmsevaluatedbykfoldcrossvalidation
_version_ 1718199555436052480