The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
碩士 === 國立成功大學 === 資訊管理研究所 === 105 === K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducin...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2017
|
Online Access: | http://ndltd.ncl.edu.tw/handle/xkvvzs |
id |
ndltd-TW-105NCKU5396002 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-105NCKU53960022019-05-15T23:47:00Z http://ndltd.ncl.edu.tw/handle/xkvvzs The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation 分層對使用K等分交叉驗證法來評估分類方法效能之影響 Jian-Kuen,Wu 吳建昆 碩士 國立成功大學 資訊管理研究所 105 K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducing the variance of estimators and thus better estimate the true accuracy. This research looks that stratification or imbalance dataset from a different perspective. General dataset is used to develop new algorithm from standard stratification on K-fold cross validation or investigate estimator from bias and variance. Imbalance dataset is used to discuss the performance of applying stratification from recall and precision or the others measure view in rare class value situation. Many types of research recommend their algorithm without the appropriate parametric method for statistical comparison. Therefore the purpose of this study is to compare these stratified methods in same condition environment, decision tree and k-nearest neighbors algorithm through reasonable statistical comparison. The results demonstrated that estimated value performance will closely with K-fold cross validation whether stratification implemented or not from single or multiple general or imbalanced dataset. Furthermore, when considering the factor of time complexity assuming stable estimator, standard stratification could be used on K-fold cross validation. By using advance stratification which takes into account features between data and data, the estimator will relatively more stable than standard stratification. Tzu-Tsung Wong 翁慈宗 2017 學位論文 ; thesis 71 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 資訊管理研究所 === 105 === K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducing the variance of estimators and thus better estimate the true accuracy. This research looks that stratification or imbalance dataset from a different perspective. General dataset is used to develop new algorithm from standard stratification on K-fold cross validation or investigate estimator from bias and variance. Imbalance dataset is used to discuss the performance of applying stratification from recall and precision or the others measure view in rare class value situation. Many types of research recommend their algorithm without the appropriate parametric method for statistical comparison. Therefore the purpose of this study is to compare these stratified methods in same condition environment, decision tree and k-nearest neighbors algorithm through reasonable statistical comparison. The results demonstrated that estimated value performance will closely with K-fold cross validation whether stratification implemented or not from single or multiple general or imbalanced dataset. Furthermore, when considering the factor of time complexity assuming stable estimator, standard stratification could be used on K-fold cross validation. By using advance stratification which takes into account features between data and data, the estimator will relatively more stable than standard stratification.
|
author2 |
Tzu-Tsung Wong |
author_facet |
Tzu-Tsung Wong Jian-Kuen,Wu 吳建昆 |
author |
Jian-Kuen,Wu 吳建昆 |
spellingShingle |
Jian-Kuen,Wu 吳建昆 The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
author_sort |
Jian-Kuen,Wu |
title |
The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
title_short |
The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
title_full |
The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
title_fullStr |
The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
title_full_unstemmed |
The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
title_sort |
impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation |
publishDate |
2017 |
url |
http://ndltd.ncl.edu.tw/handle/xkvvzs |
work_keys_str_mv |
AT jiankuenwu theimpactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation AT wújiànkūn theimpactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation AT jiankuenwu fēncéngduìshǐyòngkděngfēnjiāochāyànzhèngfǎláipínggūfēnlèifāngfǎxiàonéngzhīyǐngxiǎng AT wújiànkūn fēncéngduìshǐyòngkděngfēnjiāochāyànzhèngfǎláipínggūfēnlèifāngfǎxiàonéngzhīyǐngxiǎng AT jiankuenwu impactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation AT wújiànkūn impactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation |
_version_ |
1719154723177627648 |