Summary: | 碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods.
|