The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation

碩士 === 國立成功大學 === 資訊管理研究所 === 105 === K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducin...

Full description

Bibliographic Details
Main Authors: Jian-Kuen,Wu, 吳建昆
Other Authors: Tzu-Tsung Wong
Format: Others
Language:zh-TW
Published: 2017
Online Access:http://ndltd.ncl.edu.tw/handle/xkvvzs
id ndltd-TW-105NCKU5396002
record_format oai_dc
spelling ndltd-TW-105NCKU53960022019-05-15T23:47:00Z http://ndltd.ncl.edu.tw/handle/xkvvzs The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation 分層對使用K等分交叉驗證法來評估分類方法效能之影響 Jian-Kuen,Wu 吳建昆 碩士 國立成功大學 資訊管理研究所 105 K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducing the variance of estimators and thus better estimate the true accuracy. This research looks that stratification or imbalance dataset from a different perspective. General dataset is used to develop new algorithm from standard stratification on K-fold cross validation or investigate estimator from bias and variance. Imbalance dataset is used to discuss the performance of applying stratification from recall and precision or the others measure view in rare class value situation. Many types of research recommend their algorithm without the appropriate parametric method for statistical comparison. Therefore the purpose of this study is to compare these stratified methods in same condition environment, decision tree and k-nearest neighbors algorithm through reasonable statistical comparison. The results demonstrated that estimated value performance will closely with K-fold cross validation whether stratification implemented or not from single or multiple general or imbalanced dataset. Furthermore, when considering the factor of time complexity assuming stable estimator, standard stratification could be used on K-fold cross validation. By using advance stratification which takes into account features between data and data, the estimator will relatively more stable than standard stratification. Tzu-Tsung Wong 翁慈宗 2017 學位論文 ; thesis 71 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊管理研究所 === 105 === K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducing the variance of estimators and thus better estimate the true accuracy. This research looks that stratification or imbalance dataset from a different perspective. General dataset is used to develop new algorithm from standard stratification on K-fold cross validation or investigate estimator from bias and variance. Imbalance dataset is used to discuss the performance of applying stratification from recall and precision or the others measure view in rare class value situation. Many types of research recommend their algorithm without the appropriate parametric method for statistical comparison. Therefore the purpose of this study is to compare these stratified methods in same condition environment, decision tree and k-nearest neighbors algorithm through reasonable statistical comparison. The results demonstrated that estimated value performance will closely with K-fold cross validation whether stratification implemented or not from single or multiple general or imbalanced dataset. Furthermore, when considering the factor of time complexity assuming stable estimator, standard stratification could be used on K-fold cross validation. By using advance stratification which takes into account features between data and data, the estimator will relatively more stable than standard stratification.
author2 Tzu-Tsung Wong
author_facet Tzu-Tsung Wong
Jian-Kuen,Wu
吳建昆
author Jian-Kuen,Wu
吳建昆
spellingShingle Jian-Kuen,Wu
吳建昆
The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
author_sort Jian-Kuen,Wu
title The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
title_short The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
title_full The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
title_fullStr The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
title_full_unstemmed The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
title_sort impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation
publishDate 2017
url http://ndltd.ncl.edu.tw/handle/xkvvzs
work_keys_str_mv AT jiankuenwu theimpactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation
AT wújiànkūn theimpactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation
AT jiankuenwu fēncéngduìshǐyòngkděngfēnjiāochāyànzhèngfǎláipínggūfēnlèifāngfǎxiàonéngzhīyǐngxiǎng
AT wújiànkūn fēncéngduìshǐyòngkděngfēnjiāochāyànzhèngfǎláipínggūfēnlèifāngfǎxiàonéngzhīyǐngxiǎng
AT jiankuenwu impactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation
AT wújiànkūn impactofstratificationontheperformanceofclassificationalgorithmsevaluatedbykfoldcrossvalidation
_version_ 1719154723177627648