Summary: | 碩士 === 國立成功大學 === 資訊管理研究所 === 105 === Classification is an essential task in data mining. Preprocess techniques are generally used to improve data quality for enhancing the performance of class prediction. The techniques for data preprocessing can be categorized as on attributes or on instances. A classification algorithm is trained by the data that have been processed by another, and this is called hybrid classification. This study presents a hybrid classification algorithm that first divides a training set into two subsets by a classification algorithm. Then a model is learned from not only each of the two subsets, but also from the whole training set by another algorithm. Every test instance will be classified by one of the three models. The proposed hybrid classification algorithm is tested on 20 data sets for analyzing its prediction accuracy and computational efficiency. The experimental results show that our hybrid algorithm significantly outperforms naïve Bayesian classifier and decision tree learning in most data sets, while it needs more time to learn models. With respect to two hybrid classification algorithms proposed by other studies, our hybrid algorithm can have not only a significantly higher accuracy, but also a relatively lower computational cost.
|