The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms

碩士 === 國立成功大學 === 資訊管理研究所 === 106 === Cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of the accuracy estimate resulting from k-fold cross validation is generally relatively large, and several evaluated methods are therefore developed...

Full description

Bibliographic Details
Main Authors: Min-RuWei, 魏敏如
Other Authors: Tzu-Tsung Wong
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/uj4ny9
id ndltd-TW-106NCKU5396008
record_format oai_dc
spelling ndltd-TW-106NCKU53960082019-07-25T04:46:49Z http://ndltd.ncl.edu.tw/handle/uj4ny9 The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms 交叉驗證評估法對分類方法效能估計值之影響 Min-RuWei 魏敏如 碩士 國立成功大學 資訊管理研究所 106 Cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of the accuracy estimate resulting from k-fold cross validation is generally relatively large, and several evaluated methods are therefore developed for reducing the variance. When a data set is processed by two evaluation methods for the same classification algorithm, the resulting accuracies for the two evaluation methods may not be independence for performance comparison. The purpose of this research is to propose statistical methods for comparing the performance for various evaluation methods. When a data set is classified by the same algorithm, the independence test for two binary random variables is first introduced to identify whether the predictions of the same instance for two evaluation methods are independent or not. Then statistical methods are proposed to comparing the performance of a classification algorithm on single data set or multiple data sets processed by two dependent evaluation methods. Classification algorithms decision tree induction and k-nearest neighbor are chosen to test the performance of four evaluation methods. The experimental results of the independence test on twenty ordinary data sets shows that the predictions of instances for various evaluation methods are generally dependent, and the testing results of our statistical methods suggested that the accuracy estimates resulting from various evaluation methods are not significantly different. Nonparametric statistical methods are employed to test the variance of accuracy for ordinary data sets and the mean and variance of F-measure for imbalanced data sets. Those tests also indicate that the performance of a classification algorithm will not be significantly different when data sets are processed by various evaluation methods. Tzu-Tsung Wong 翁慈宗 2018 學位論文 ; thesis 61 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊管理研究所 === 106 === Cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of the accuracy estimate resulting from k-fold cross validation is generally relatively large, and several evaluated methods are therefore developed for reducing the variance. When a data set is processed by two evaluation methods for the same classification algorithm, the resulting accuracies for the two evaluation methods may not be independence for performance comparison. The purpose of this research is to propose statistical methods for comparing the performance for various evaluation methods. When a data set is classified by the same algorithm, the independence test for two binary random variables is first introduced to identify whether the predictions of the same instance for two evaluation methods are independent or not. Then statistical methods are proposed to comparing the performance of a classification algorithm on single data set or multiple data sets processed by two dependent evaluation methods. Classification algorithms decision tree induction and k-nearest neighbor are chosen to test the performance of four evaluation methods. The experimental results of the independence test on twenty ordinary data sets shows that the predictions of instances for various evaluation methods are generally dependent, and the testing results of our statistical methods suggested that the accuracy estimates resulting from various evaluation methods are not significantly different. Nonparametric statistical methods are employed to test the variance of accuracy for ordinary data sets and the mean and variance of F-measure for imbalanced data sets. Those tests also indicate that the performance of a classification algorithm will not be significantly different when data sets are processed by various evaluation methods.
author2 Tzu-Tsung Wong
author_facet Tzu-Tsung Wong
Min-RuWei
魏敏如
author Min-RuWei
魏敏如
spellingShingle Min-RuWei
魏敏如
The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
author_sort Min-RuWei
title The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_short The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_full The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_fullStr The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_full_unstemmed The Impact of Cross-Validation Methods on the Performance Estimates of Classification Algorithms
title_sort impact of cross-validation methods on the performance estimates of classification algorithms
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/uj4ny9
work_keys_str_mv AT minruwei theimpactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms
AT wèimǐnrú theimpactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms
AT minruwei jiāochāyànzhèngpínggūfǎduìfēnlèifāngfǎxiàonénggūjìzhízhīyǐngxiǎng
AT wèimǐnrú jiāochāyànzhèngpínggūfǎduìfēnlèifāngfǎxiàonénggūjìzhízhīyǐngxiǎng
AT minruwei impactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms
AT wèimǐnrú impactofcrossvalidationmethodsontheperformanceestimatesofclassificationalgorithms
_version_ 1719230231899799552