Research on Unbalanced Data Classification Based on Hybrid Method
碩士 === 元智大學 === 資訊工程學系 === 107 === Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/sy353q |
id |
ndltd-TW-107YZU05392001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107YZU053920012019-11-07T03:39:34Z http://ndltd.ncl.edu.tw/handle/sy353q Research on Unbalanced Data Classification Based on Hybrid Method 基於混合方法探究不平衡數據分類問題 Nai-Nan ZHANG 張乃楠 碩士 元智大學 資訊工程學系 107 Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It is assumed that traditional machine learning algorithms attempt to minimize empirical risk factors, and, as a result, the classification accuracy of the minority is often sacrificed. However, people are often interested in the minority. Various data-level methods, such as over- and under-sampling, and algorithm-level methods, such as ensemble, cost-sensitive, and one-class learning, have been proposed to improve classifier performance with an unbalanced data distribution. Based on such methods, we proposed a hybrid approach to deal with unbalanced data problem that comprises data preprocessing, clustering, data balancing, model building, and ensemble. Ting Ying Chien 簡廷因 2018 學位論文 ; thesis 52 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 元智大學 === 資訊工程學系 === 107 === Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It is assumed that traditional machine learning algorithms attempt to minimize empirical risk factors, and, as a result, the classification accuracy of the minority is often sacrificed. However, people are often interested in the minority. Various data-level methods, such as over- and under-sampling, and algorithm-level methods, such as ensemble, cost-sensitive, and one-class learning, have been proposed to improve classifier performance with an unbalanced data distribution. Based on such methods, we proposed a hybrid approach to deal with unbalanced data problem that comprises data preprocessing, clustering, data balancing, model building, and ensemble.
|
author2 |
Ting Ying Chien |
author_facet |
Ting Ying Chien Nai-Nan ZHANG 張乃楠 |
author |
Nai-Nan ZHANG 張乃楠 |
spellingShingle |
Nai-Nan ZHANG 張乃楠 Research on Unbalanced Data Classification Based on Hybrid Method |
author_sort |
Nai-Nan ZHANG |
title |
Research on Unbalanced Data Classification Based on Hybrid Method |
title_short |
Research on Unbalanced Data Classification Based on Hybrid Method |
title_full |
Research on Unbalanced Data Classification Based on Hybrid Method |
title_fullStr |
Research on Unbalanced Data Classification Based on Hybrid Method |
title_full_unstemmed |
Research on Unbalanced Data Classification Based on Hybrid Method |
title_sort |
research on unbalanced data classification based on hybrid method |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/sy353q |
work_keys_str_mv |
AT nainanzhang researchonunbalanceddataclassificationbasedonhybridmethod AT zhāngnǎinán researchonunbalanceddataclassificationbasedonhybridmethod AT nainanzhang jīyúhùnhéfāngfǎtànjiūbùpínghéngshùjùfēnlèiwèntí AT zhāngnǎinán jīyúhùnhéfāngfǎtànjiūbùpínghéngshùjùfēnlèiwèntí |
_version_ |
1719287897159368704 |