The Data Complexity Index to Construct an Efficient Cross-validation Method

博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a valid...

Full description

Bibliographic Details
Main Authors:	Yao-hwei Fang, 方耀輝
Other Authors:	Der-chiang Li
Format:	Others
Language:	en_US
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/89947946856606238014

id	ndltd-TW-097NCKU5041067
record_format	oai_dc
spelling	ndltd-TW-097NCKU50410672016-05-04T04:26:29Z http://ndltd.ncl.edu.tw/handle/89947946856606238014 The Data Complexity Index to Construct an Efficient Cross-validation Method 以資料複雜度指標建構效率型交互驗證方法 Yao-hwei Fang 方耀輝博士國立成功大學工業與資訊管理學系碩博士班 97 Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets. The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time. Der-chiang Li 利德江 2009 學位論文 ; thesis 49 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets. The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time.
author2	Der-chiang Li
author_facet	Der-chiang Li Yao-hwei Fang 方耀輝
author	Yao-hwei Fang 方耀輝
spellingShingle	Yao-hwei Fang 方耀輝 The Data Complexity Index to Construct an Efficient Cross-validation Method
author_sort	Yao-hwei Fang
title	The Data Complexity Index to Construct an Efficient Cross-validation Method
title_short	The Data Complexity Index to Construct an Efficient Cross-validation Method
title_full	The Data Complexity Index to Construct an Efficient Cross-validation Method
title_fullStr	The Data Complexity Index to Construct an Efficient Cross-validation Method
title_full_unstemmed	The Data Complexity Index to Construct an Efficient Cross-validation Method
title_sort	data complexity index to construct an efficient cross-validation method
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/89947946856606238014
work_keys_str_mv	AT yaohweifang thedatacomplexityindextoconstructanefficientcrossvalidationmethod AT fāngyàohuī thedatacomplexityindextoconstructanefficientcrossvalidationmethod AT yaohweifang yǐzīliàofùzádùzhǐbiāojiàngòuxiàolǜxíngjiāohùyànzhèngfāngfǎ AT fāngyàohuī yǐzīliàofùzádùzhǐbiāojiàngòuxiàolǜxíngjiāohùyànzhèngfāngfǎ AT yaohweifang datacomplexityindextoconstructanefficientcrossvalidationmethod AT fāngyàohuī datacomplexityindextoconstructanefficientcrossvalidationmethod
_version_	1718258531494264832

The Data Complexity Index to Construct an Efficient Cross-validation Method

Similar Items