The Data Complexity Index to Construct an Efficient Cross-validation Method
博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a valid...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/89947946856606238014 |
id |
ndltd-TW-097NCKU5041067 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097NCKU50410672016-05-04T04:26:29Z http://ndltd.ncl.edu.tw/handle/89947946856606238014 The Data Complexity Index to Construct an Efficient Cross-validation Method 以資料複雜度指標建構效率型交互驗證方法 Yao-hwei Fang 方耀輝 博士 國立成功大學 工業與資訊管理學系碩博士班 97 Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets. The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time. Der-chiang Li 利德江 2009 學位論文 ; thesis 49 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立成功大學 === 工業與資訊管理學系碩博士班 === 97 === Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size or the number of experiment runs, to implement a validated evaluation. This research develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index called the CBE index, which has high correlation with the classification accuracies. The CBE index and the sample size determination can be used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with complex and computationally expensive classification data sets.
The experiment results show that the high correlation between the found CBE index and the classification accuracies, and the performances of CBE cross-validation and K-fold Cross-validation and Repeated Random Sub-sampling Validation are similar and that the training time required for CBE cross-validation is lower than that for K-fold Cross-validation and Repeated Random Sub-sampling Validation. CBE index helps users understand the characteristics of the analyzed data in advance, and CBE cross-validation helps users find optimal training data size and the number of experiment runs to reduce model evaluation time.
|
author2 |
Der-chiang Li |
author_facet |
Der-chiang Li Yao-hwei Fang 方耀輝 |
author |
Yao-hwei Fang 方耀輝 |
spellingShingle |
Yao-hwei Fang 方耀輝 The Data Complexity Index to Construct an Efficient Cross-validation Method |
author_sort |
Yao-hwei Fang |
title |
The Data Complexity Index to Construct an Efficient Cross-validation Method |
title_short |
The Data Complexity Index to Construct an Efficient Cross-validation Method |
title_full |
The Data Complexity Index to Construct an Efficient Cross-validation Method |
title_fullStr |
The Data Complexity Index to Construct an Efficient Cross-validation Method |
title_full_unstemmed |
The Data Complexity Index to Construct an Efficient Cross-validation Method |
title_sort |
data complexity index to construct an efficient cross-validation method |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/89947946856606238014 |
work_keys_str_mv |
AT yaohweifang thedatacomplexityindextoconstructanefficientcrossvalidationmethod AT fāngyàohuī thedatacomplexityindextoconstructanefficientcrossvalidationmethod AT yaohweifang yǐzīliàofùzádùzhǐbiāojiàngòuxiàolǜxíngjiāohùyànzhèngfāngfǎ AT fāngyàohuī yǐzīliàofùzádùzhǐbiāojiàngòuxiàolǜxíngjiāohùyànzhèngfāngfǎ AT yaohweifang datacomplexityindextoconstructanefficientcrossvalidationmethod AT fāngyàohuī datacomplexityindextoconstructanefficientcrossvalidationmethod |
_version_ |
1718258531494264832 |