Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution

碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distributio...

Full description

Bibliographic Details
Main Authors: Shin-Mau Chen, 陳心懋
Other Authors: Yue-Shi Lee
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/01578873641582943337
id ndltd-TW-097MCU05392013
record_format oai_dc
spelling ndltd-TW-097MCU053920132017-05-14T04:31:27Z http://ndltd.ncl.edu.tw/handle/01578873641582943337 Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution 結合集群分析於不平衡資料之分類預測方法 Shin-Mau Chen 陳心懋 碩士 銘傳大學 資訊工程學系碩士班 97 Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods. Yue-Shi Lee 李御璽 2009 學位論文 ; thesis 84 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods.
author2 Yue-Shi Lee
author_facet Yue-Shi Lee
Shin-Mau Chen
陳心懋
author Shin-Mau Chen
陳心懋
spellingShingle Shin-Mau Chen
陳心懋
Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
author_sort Shin-Mau Chen
title Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_short Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_full Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_fullStr Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_full_unstemmed Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
title_sort combine cluster analysis to classification prediction in imbalance data distribution
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/01578873641582943337
work_keys_str_mv AT shinmauchen combineclusteranalysistoclassificationpredictioninimbalancedatadistribution
AT chénxīnmào combineclusteranalysistoclassificationpredictioninimbalancedatadistribution
AT shinmauchen jiéhéjíqúnfēnxīyúbùpínghéngzīliàozhīfēnlèiyùcèfāngfǎ
AT chénxīnmào jiéhéjíqúnfēnxīyúbùpínghéngzīliàozhīfēnlèiyùcèfāngfǎ
_version_ 1718448360363393024