Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution

碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distributio...

Full description

Bibliographic Details
Main Authors:	Shin-Mau Chen, 陳心懋
Other Authors:	Yue-Shi Lee
Format:	Others
Language:	zh-TW
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/01578873641582943337

id	ndltd-TW-097MCU05392013
record_format	oai_dc
spelling	ndltd-TW-097MCU053920132017-05-14T04:31:27Z http://ndltd.ncl.edu.tw/handle/01578873641582943337 Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution 結合集群分析於不平衡資料之分類預測方法 Shin-Mau Chen 陳心懋碩士銘傳大學資訊工程學系碩士班 97 Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods. Yue-Shi Lee 李御璽 2009 學位論文 ; thesis 84 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods.
author2	Yue-Shi Lee
author_facet	Yue-Shi Lee Shin-Mau Chen 陳心懋
author	Shin-Mau Chen 陳心懋
spellingShingle	Shin-Mau Chen 陳心懋 Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
author_sort	Shin-Mau Chen
title	Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_short	Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_full	Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_fullStr	Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_full_unstemmed	Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_sort	combine cluster analysis to classification prediction in imbalance data distribution
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/01578873641582943337
work_keys_str_mv	AT shinmauchen combineclusteranalysistoclassificationpredictioninimbalancedatadistribution AT chénxīnmào combineclusteranalysistoclassificationpredictioninimbalancedatadistribution AT shinmauchen jiéhéjíqúnfēnxīyúbùpínghéngzīliàozhīfēnlèiyùcèfāngfǎ AT chénxīnmào jiéhéjíqúnfēnxīyúbùpínghéngzīliàozhīfēnlèiyùcèfāngfǎ
_version_	1718448360363393024

Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution

Similar Items