A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets

碩士 === 國立成功大學 === 資訊管理研究所 === 106 === The performance of classification algorithms are generally evaluated by accuracy with huge amounts of data. Accuracy is one of the most convenient and direct indicators. However, classification algorithms will tend to predict most of data as the majority of 　the...

Full description

Bibliographic Details
Main Authors:	Wen-JingChen, 陳玟靜
Other Authors:	Tzu-Tsung Wong
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/w3262v

id	ndltd-TW-106NCKU5396012
record_format	oai_dc
spelling	ndltd-TW-106NCKU53960122019-07-25T04:46:49Z http://ndltd.ncl.edu.tw/handle/w3262v A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets 用線性概約法來推導F測度抽樣分配以衡量分類方法在不平衡資料檔上效能之研究 Wen-JingChen 陳玟靜碩士國立成功大學資訊管理研究所 106 The performance of classification algorithms are generally evaluated by accuracy with huge amounts of data. Accuracy is one of the most convenient and direct indicators. However, classification algorithms will tend to predict most of data as the majority of 　the category values on imbalanced data sets, accuracy is no longer an appropriate measure for performance evaluation. F-measure is the harmonic mean of precision and recall, and these two indicators are dependent of each other. So, there is no appropriate parametric method to compare the F-measures of different classification algorithms. This study presents parametric methods for comparing the performance of two classification algorithms on one or multiple imbalance data sets when the evaluation measure is bivariate normal distribution by recall and precision. Then hypothesis testing is used to compare whether there is significant difference between two classification algorithms. The main purpose is to use F-measures as performance evaluation on imbalanced data. There are four classification algorithms considered in this study. The experimental results show that Naive Bayes method performs poorly under imbalanced data sets. After we compare with nonparametric Wilcoxon signed- test, we find that the parametric method proposed in this study can effectively compare the performance of two classification algorithms. Tzu-Tsung Wong 翁慈宗 2018 學位論文 ; thesis 49 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 資訊管理研究所 === 106 === The performance of classification algorithms are generally evaluated by accuracy with huge amounts of data. Accuracy is one of the most convenient and direct indicators. However, classification algorithms will tend to predict most of data as the majority of 　the category values on imbalanced data sets, accuracy is no longer an appropriate measure for performance evaluation. F-measure is the harmonic mean of precision and recall, and these two indicators are dependent of each other. So, there is no appropriate parametric method to compare the F-measures of different classification algorithms. This study presents parametric methods for comparing the performance of two classification algorithms on one or multiple imbalance data sets when the evaluation measure is bivariate normal distribution by recall and precision. Then hypothesis testing is used to compare whether there is significant difference between two classification algorithms. The main purpose is to use F-measures as performance evaluation on imbalanced data. There are four classification algorithms considered in this study. The experimental results show that Naive Bayes method performs poorly under imbalanced data sets. After we compare with nonparametric Wilcoxon signed- test, we find that the parametric method proposed in this study can effectively compare the performance of two classification algorithms.
author2	Tzu-Tsung Wong
author_facet	Tzu-Tsung Wong Wen-JingChen 陳玟靜
author	Wen-JingChen 陳玟靜
spellingShingle	Wen-JingChen 陳玟靜 A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets
author_sort	Wen-JingChen
title	A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets
title_short	A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets
title_full	A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets
title_fullStr	A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets
title_full_unstemmed	A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets
title_sort	linear approximation approach of f-measure for evaluating the performance of classification algorithms on imbalanced data sets
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/w3262v
work_keys_str_mv	AT wenjingchen alinearapproximationapproachoffmeasureforevaluatingtheperformanceofclassificationalgorithmsonimbalanceddatasets AT chénwénjìng alinearapproximationapproachoffmeasureforevaluatingtheperformanceofclassificationalgorithmsonimbalanceddatasets AT wenjingchen yòngxiànxìnggàiyuēfǎláituīdǎofcèdùchōuyàngfēnpèiyǐhéngliàngfēnlèifāngfǎzàibùpínghéngzīliàodàngshàngxiàonéngzhīyánjiū AT chénwénjìng yòngxiànxìnggàiyuēfǎláituīdǎofcèdùchōuyàngfēnpèiyǐhéngliàngfēnlèifāngfǎzàibùpínghéngzīliàodàngshàngxiàonéngzhīyánjiū AT wenjingchen linearapproximationapproachoffmeasureforevaluatingtheperformanceofclassificationalgorithmsonimbalanceddatasets AT chénwénjìng linearapproximationapproachoffmeasureforevaluatingtheperformanceofclassificationalgorithmsonimbalanceddatasets
_version_	1719230233348931584

A Linear Approximation Approach of F-Measure for Evaluating the Performance of Classification Algorithms on Imbalanced Data Sets

Similar Items