Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets

碩士 === 國立成功大學 === 資訊管理研究所 === 104 === The performance of classification algorithms are generally evaluated by accuracy. However, when the numbers of instances or the misclassification costs for various class values are largely different, accuracy is no longer an appropriate measure for performance e...

Full description

Bibliographic Details
Main Authors:	Che-HsuanLin, 林哲玄
Other Authors:	Tzu-Tsung Wong
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/39398602168263993420

id	ndltd-TW-104NCKU5396008
record_format	oai_dc
spelling	ndltd-TW-104NCKU53960082017-10-29T04:35:10Z http://ndltd.ncl.edu.tw/handle/39398602168263993420 Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets 不平衡資料檔下比較兩分類演算法效能之統計方法 Che-HsuanLin 林哲玄碩士國立成功大學資訊管理研究所 104 The performance of classification algorithms are generally evaluated by accuracy. However, when the numbers of instances or the misclassification costs for various class values are largely different, accuracy is no longer an appropriate measure for performance evaluation. Some other measures such as recall and precision will be better choices for imbalance data sets. This study presents parametric methods for comparing the performance of two classification algorithms on multiple imbalance data sets when the evaluation measure is recall, precision, or their arithmetic mean. When the testing results satisfy the large-sample conditions, the sampling distributions of both recall and precision can be assumed to be normally distributed. Since recall and precision for the same data set are dependent, their arithmetic mean is assumed to follow a bivariate normal distribution for deriving its sampling distribution. There are four classification algorithms considered in this study. The experimental results on seven imbalance data sets demonstrate that the parametric methods proposed in this study can effectively compare the performance of two classification algorithms on multiple imbalance data sets. Tzu-Tsung Wong 翁慈宗 2016 學位論文 ; thesis 89 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 資訊管理研究所 === 104 === The performance of classification algorithms are generally evaluated by accuracy. However, when the numbers of instances or the misclassification costs for various class values are largely different, accuracy is no longer an appropriate measure for performance evaluation. Some other measures such as recall and precision will be better choices for imbalance data sets. This study presents parametric methods for comparing the performance of two classification algorithms on multiple imbalance data sets when the evaluation measure is recall, precision, or their arithmetic mean. When the testing results satisfy the large-sample conditions, the sampling distributions of both recall and precision can be assumed to be normally distributed. Since recall and precision for the same data set are dependent, their arithmetic mean is assumed to follow a bivariate normal distribution for deriving its sampling distribution. There are four classification algorithms considered in this study. The experimental results on seven imbalance data sets demonstrate that the parametric methods proposed in this study can effectively compare the performance of two classification algorithms on multiple imbalance data sets.
author2	Tzu-Tsung Wong
author_facet	Tzu-Tsung Wong Che-HsuanLin 林哲玄
author	Che-HsuanLin 林哲玄
spellingShingle	Che-HsuanLin 林哲玄 Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
author_sort	Che-HsuanLin
title	Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
title_short	Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
title_full	Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
title_fullStr	Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
title_full_unstemmed	Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
title_sort	statistical methods for comparing the performance of two classification algorithms on imbalanced data sets
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/39398602168263993420
work_keys_str_mv	AT chehsuanlin statisticalmethodsforcomparingtheperformanceoftwoclassificationalgorithmsonimbalanceddatasets AT línzhéxuán statisticalmethodsforcomparingtheperformanceoftwoclassificationalgorithmsonimbalanceddatasets AT chehsuanlin bùpínghéngzīliàodàngxiàbǐjiàoliǎngfēnlèiyǎnsuànfǎxiàonéngzhītǒngjìfāngfǎ AT línzhéxuán bùpínghéngzīliàodàngxiàbǐjiàoliǎngfēnlèiyǎnsuànfǎxiàonéngzhītǒngjìfāngfǎ
_version_	1718558148563828736

Statistical methods for comparing the performance of two classification algorithms on imbalanced data sets

Similar Items