A Study on the Appropriateness of Repeating K-fold Cross Validation

碩士 === 國立成功大學 === 工業與資訊管理學系 === 105 === K-fold cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of accuracy estimate resulting from this approach is generally relatively large for conservative inference. Several studies therefore sugges...

Full description

Bibliographic Details
Main Authors: Po-YangYeh, 葉柏揚
Other Authors: Tzu-Tsung Wong
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/6jc74q
Description
Summary:碩士 === 國立成功大學 === 工業與資訊管理學系 === 105 === K-fold cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of accuracy estimate resulting from this approach is generally relatively large for conservative inference. Several studies therefore suggested to repeatedly perform K-fold cross validation for reducing the variance. Most of them did not consider the correlation among the repetitions of K-fold cross validation, and hence the variance could be underestimated. The purpose of this thesis is to study the appropriateness of repeating K-fold cross validation. We first investigate whether the accuracy estimates obtained from the repetitions of K-fold cross validation can be assumed to be independent. K-Nearest Neighbor algorithm with K = 1 is used to analyze the dependency relationships among the predictions of two repetitions of K-fold cross validation. Statistical methods are also proposed to test the strength of the dependency relationships. The experimental results on twenty data sets show that the predictions in two repetitions of K-fold cross validation are generally highly correlated, and the correlation will be higher as the number of folds increases. The results of a simulation study suggest that the K-fold cross validation with a small number of repetitions and a large number of folds should be adopted.