Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution
碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distributio...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/01578873641582943337 |
id |
ndltd-TW-097MCU05392013 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097MCU053920132017-05-14T04:31:27Z http://ndltd.ncl.edu.tw/handle/01578873641582943337 Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution 結合集群分析於不平衡資料之分類預測方法 Shin-Mau Chen 陳心懋 碩士 銘傳大學 資訊工程學系碩士班 97 Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods. Yue-Shi Lee 李御璽 2009 學位論文 ; thesis 84 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 銘傳大學 === 資訊工程學系碩士班 === 97 === Classification is a well-studied technique in data mining and machine learning domains. Due to the forecasting characteristic of classification, it has been used in a lot of real applications. In general, the classifier usually performs well, when the distribution of target class in training dataset is uniform distribution. However, in real-world application, the distribution of target class is often imbalanced. It is called an imbalanced class distribution problem. In training dataset, when most of data are in majority class and little data are in minority class, the classifier trends to predict all the test data as the majority class. But, the prediction performance in minority class is the most important part for a decision maker. Hence, this paper combines cluster analysis to classification prediction in imbalance data distribution, to filter out most of data in majority class, increase the ratio of data in minority class, and decrease the extent of imbalanced class distribution. The experimental results show that our approach outperforms the existing methods.
|
author2 |
Yue-Shi Lee |
author_facet |
Yue-Shi Lee Shin-Mau Chen 陳心懋 |
author |
Shin-Mau Chen 陳心懋 |
spellingShingle |
Shin-Mau Chen 陳心懋 Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution |
author_sort |
Shin-Mau Chen |
title |
Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution |
title_short |
Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution |
title_full |
Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution |
title_fullStr |
Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution |
title_full_unstemmed |
Combine Cluster Analysis to Classification Prediction in Imbalance Data Distribution |
title_sort |
combine cluster analysis to classification prediction in imbalance data distribution |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/01578873641582943337 |
work_keys_str_mv |
AT shinmauchen combineclusteranalysistoclassificationpredictioninimbalancedatadistribution AT chénxīnmào combineclusteranalysistoclassificationpredictioninimbalancedatadistribution AT shinmauchen jiéhéjíqúnfēnxīyúbùpínghéngzīliàozhīfēnlèiyùcèfāngfǎ AT chénxīnmào jiéhéjíqúnfēnxīyúbùpínghéngzīliàozhīfēnlèiyùcèfāngfǎ |
_version_ |
1718448360363393024 |