Consistency Analysis of Hybrid Discretization Method among Classification Algorithms

碩士 === 國立成功大學 === 資訊管理研究所 === 103 === Discretization is one of the major approaches for processing continuous attributes for classification. However, the resulting accuracies for a data set discretized by various discretization methods may be greatly different. Hybrid discretization method was propo...

Full description

Bibliographic Details
Main Authors: Bo-HanHuang, 黃柏翰
Other Authors: Tzu-Tsung Wong
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/41008670954809817719
id ndltd-TW-103NCKU5396011
record_format oai_dc
spelling ndltd-TW-103NCKU53960112016-05-22T04:40:56Z http://ndltd.ncl.edu.tw/handle/41008670954809817719 Consistency Analysis of Hybrid Discretization Method among Classification Algorithms 不同分類器的混合型離散化方法之一致性分析 Bo-HanHuang 黃柏翰 碩士 國立成功大學 資訊管理研究所 103 Discretization is one of the major approaches for processing continuous attributes for classification. However, the resulting accuracies for a data set discretized by various discretization methods may be greatly different. Hybrid discretization method was proposed recently, and it can generally achieve a better performance for naïve Bayesian classifier than unified discretization. A study has developed a hybrid discretization method applicable for classifiers such that it can determine the discretization method for each attribute in data preprocessing step. However, the results of that study demonstrated that it cannot improve the performance of decision trees. Therefore, the objective of this study is to investigate the consistency of hybrid discretization results among classification algorithms. This study proposes two approaches to perform consistency analysis. The first approach is to identify whether the best hybrid discretization results for a classification algorithm can improve the performance of the others. A new measure is also proposed to evaluate the consistency of the best hybrid discretization results of two classification algorithms. The classification tools for testing our methods are decision trees, naïve Bayesian classifiers, and rule-based classifiers. The experimental results on 30 data sets show that the best hybrid discretization results for an algorithm seldom improve the performance of the others. Moreover, most of the values of the consistency measure are low. These results suggest that the characteristics of a classification algorithm should be considered in designing a hybrid discretization method in data preprocessing. Tzu-Tsung Wong 翁慈宗 2015 學位論文 ; thesis 50 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊管理研究所 === 103 === Discretization is one of the major approaches for processing continuous attributes for classification. However, the resulting accuracies for a data set discretized by various discretization methods may be greatly different. Hybrid discretization method was proposed recently, and it can generally achieve a better performance for naïve Bayesian classifier than unified discretization. A study has developed a hybrid discretization method applicable for classifiers such that it can determine the discretization method for each attribute in data preprocessing step. However, the results of that study demonstrated that it cannot improve the performance of decision trees. Therefore, the objective of this study is to investigate the consistency of hybrid discretization results among classification algorithms. This study proposes two approaches to perform consistency analysis. The first approach is to identify whether the best hybrid discretization results for a classification algorithm can improve the performance of the others. A new measure is also proposed to evaluate the consistency of the best hybrid discretization results of two classification algorithms. The classification tools for testing our methods are decision trees, naïve Bayesian classifiers, and rule-based classifiers. The experimental results on 30 data sets show that the best hybrid discretization results for an algorithm seldom improve the performance of the others. Moreover, most of the values of the consistency measure are low. These results suggest that the characteristics of a classification algorithm should be considered in designing a hybrid discretization method in data preprocessing.
author2 Tzu-Tsung Wong
author_facet Tzu-Tsung Wong
Bo-HanHuang
黃柏翰
author Bo-HanHuang
黃柏翰
spellingShingle Bo-HanHuang
黃柏翰
Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
author_sort Bo-HanHuang
title Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
title_short Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
title_full Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
title_fullStr Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
title_full_unstemmed Consistency Analysis of Hybrid Discretization Method among Classification Algorithms
title_sort consistency analysis of hybrid discretization method among classification algorithms
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/41008670954809817719
work_keys_str_mv AT bohanhuang consistencyanalysisofhybriddiscretizationmethodamongclassificationalgorithms
AT huángbǎihàn consistencyanalysisofhybriddiscretizationmethodamongclassificationalgorithms
AT bohanhuang bùtóngfēnlèiqìdehùnhéxínglísànhuàfāngfǎzhīyīzhìxìngfēnxī
AT huángbǎihàn bùtóngfēnlèiqìdehùnhéxínglísànhuàfāngfǎzhīyīzhìxìngfēnxī
_version_ 1718277153653522432