The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms

碩士 === 國立成功大學 === 資訊管理研究所 === 106 === Cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of the accuracy estimate resulting from k-fold cross validation is generally relatively large, and several evaluated methods are therefore developed...

Full description

Bibliographic Details
Main Authors:	Min-RuWei, 魏敏如
Other Authors:	Tzu-Tsung Wong
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/uj4ny9

id	ndltd-TW-106NCKU5396008
record_format	oai_dc
spelling	ndltd-TW-106NCKU53960082019-07-25T04:46:49Z http://ndltd.ncl.edu.tw/handle/uj4ny9 The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms 交叉驗證評估法對分類方法效能估計值之影響 Min-RuWei 魏敏如碩士國立成功大學資訊管理研究所 106 Cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of the accuracy estimate resulting from k-fold cross validation is generally relatively large, and several evaluated methods are therefore developed for reducing the variance. When a data set is processed by two evaluation methods for the same classification algorithm, the resulting accuracies for the two evaluation methods may not be independence for performance comparison. The purpose of this research is to propose statistical methods for comparing the performance for various evaluation methods. When a data set is classified by the same algorithm, the independence test for two binary random variables is first introduced to identify whether the predictions of the same instance for two evaluation methods are independent or not. Then statistical methods are proposed to comparing the performance of a classification algorithm on single data set or multiple data sets processed by two dependent evaluation methods. Classification algorithms decision tree induction and k-nearest neighbor are chosen to test the performance of four evaluation methods. The experimental results of the independence test on twenty ordinary data sets shows that the predictions of instances for various evaluation methods are generally dependent, and the testing results of our statistical methods suggested that the accuracy estimates resulting from various evaluation methods are not significantly different. Nonparametric statistical methods are employed to test the variance of accuracy for ordinary data sets and the mean and variance of F-measure for imbalanced data sets. Those tests also indicate that the performance of a classification algorithm will not be significantly different when data sets are processed by various evaluation methods. Tzu-Tsung Wong 翁慈宗 2018 學位論文 ; thesis 61 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 資訊管理研究所 === 106 === Cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of the accuracy estimate resulting from k-fold cross validation is generally relatively large, and several evaluated methods are therefore developed for reducing the variance. When a data set is processed by two evaluation methods for the same classification algorithm, the resulting accuracies for the two evaluation methods may not be independence for performance comparison. The purpose of this research is to propose statistical methods for comparing the performance for various evaluation methods. When a data set is classified by the same algorithm, the independence test for two binary random variables is first introduced to identify whether the predictions of the same instance for two evaluation methods are independent or not. Then statistical methods are proposed to comparing the performance of a classification algorithm on single data set or multiple data sets processed by two dependent evaluation methods. Classification algorithms decision tree induction and k-nearest neighbor are chosen to test the performance of four evaluation methods. The experimental results of the independence test on twenty ordinary data sets shows that the predictions of instances for various evaluation methods are generally dependent, and the testing results of our statistical methods suggested that the accuracy estimates resulting from various evaluation methods are not significantly different. Nonparametric statistical methods are employed to test the variance of accuracy for ordinary data sets and the mean and variance of F-measure for imbalanced data sets. Those tests also indicate that the performance of a classification algorithm will not be significantly different when data sets are processed by various evaluation methods.
author2	Tzu-Tsung Wong
author_facet	Tzu-Tsung Wong Min-RuWei 魏敏如
author	Min-RuWei 魏敏如
spellingShingle	Min-RuWei 魏敏如 The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
author_sort	Min-RuWei
title	The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_short	The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_full	The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_fullStr	The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_full_unstemmed	The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_sort	impact of cross-validation methods on the performance estimates of classification algorithms
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/uj4ny9
work_keys_str_mv	AT minruwei theimpactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms AT wèimǐnrú theimpactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms AT minruwei jiāochāyànzhèngpínggūfǎduìfēnlèifāngfǎxiàonénggūjìzhízhīyǐngxiǎng AT wèimǐnrú jiāochāyànzhèngpínggūfǎduìfēnlèifāngfǎxiàonénggūjìzhízhīyǐngxiǎng AT minruwei impactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms AT wèimǐnrú impactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms
_version_	1719230231899799552

The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms

Similar Items