The Data Complexity Index to Construct an Efficient Cross-validation Method

博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a valid...

Full description

Bibliographic Details
Main Authors: Yao-hwei Fang, 方耀輝
Other Authors: Der-chiang Li
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/89947946856606238014
Description
Summary:博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets. The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time.