The Data Complexity Index to Construct an Efficient Cross-validation Method

博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a valid...

Full description

Bibliographic Details
Main Authors: Yao-hwei Fang, 方耀輝
Other Authors: Der-chiang Li
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/89947946856606238014
id ndltd-TW-097NCKU5041067
record_format oai_dc
spelling ndltd-TW-097NCKU50410672016-05-04T04:26:29Z http://ndltd.ncl.edu.tw/handle/89947946856606238014 The Data Complexity Index to Construct an Efficient Cross-validation Method 以資料複雜度指標建構效率型交互驗證方法 Yao-hwei Fang 方耀輝 博士 國立成功大學 工業與資訊管理學系碩博士班 97 Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets. The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time. Der-chiang Li 利德江 2009 學位論文 ; thesis 49 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets. The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time.
author2 Der-chiang Li
author_facet Der-chiang Li
Yao-hwei Fang
方耀輝
author Yao-hwei Fang
方耀輝
spellingShingle Yao-hwei Fang
方耀輝
The Data Complexity Index to Construct an Efficient Cross-validation Method
author_sort Yao-hwei Fang
title The Data Complexity Index to Construct an Efficient Cross-validation Method
title_short The Data Complexity Index to Construct an Efficient Cross-validation Method
title_full The Data Complexity Index to Construct an Efficient Cross-validation Method
title_fullStr The Data Complexity Index to Construct an Efficient Cross-validation Method
title_full_unstemmed The Data Complexity Index to Construct an Efficient Cross-validation Method
title_sort data complexity index to construct an efficient cross-validation method
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/89947946856606238014
work_keys_str_mv AT yaohweifang thedatacomplexityindextoconstructanefficientcrossvalidationmethod
AT fāngyàohuī thedatacomplexityindextoconstructanefficientcrossvalidationmethod
AT yaohweifang yǐzīliàofùzádùzhǐbiāojiàngòuxiàolǜxíngjiāohùyànzhèngfāngfǎ
AT fāngyàohuī yǐzīliàofùzádùzhǐbiāojiàngòuxiàolǜxíngjiāohùyànzhèngfāngfǎ
AT yaohweifang datacomplexityindextoconstructanefficientcrossvalidationmethod
AT fāngyàohuī datacomplexityindextoconstructanefficientcrossvalidationmethod
_version_ 1718258531494264832