Research on Unbalanced Data Classification Based on Hybrid Method

碩士 === 元智大學 === 資訊工程學系 === 107 === Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It...

Full description

Bibliographic Details
Main Authors: Nai-Nan ZHANG, 張乃楠
Other Authors: Ting Ying Chien
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/sy353q
id ndltd-TW-107YZU05392001
record_format oai_dc
spelling ndltd-TW-107YZU053920012019-11-07T03:39:34Z http://ndltd.ncl.edu.tw/handle/sy353q Research on Unbalanced Data Classification Based on Hybrid Method 基於混合方法探究不平衡數據分類問題 Nai-Nan ZHANG 張乃楠 碩士 元智大學 資訊工程學系 107 Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It is assumed that traditional machine learning algorithms attempt to minimize empirical risk factors, and, as a result, the classification accuracy of the minority is often sacrificed. However, people are often interested in the minority. Various data-level methods, such as over- and under-sampling, and algorithm-level methods, such as ensemble, cost-sensitive, and one-class learning, have been proposed to improve classifier performance with an unbalanced data distribution. Based on such methods, we proposed a hybrid approach to deal with unbalanced data problem that comprises data preprocessing, clustering, data balancing, model building, and ensemble. Ting Ying Chien 簡廷因 2018 學位論文 ; thesis 52 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 元智大學 === 資訊工程學系 === 107 === Unbalanced data are ubiquitous in real-world datasets. In this paper, we investigate unbalanced data distribution for binary classification, i.e., where the number of majority class instances is significantly greater than the number of minority class instances. It is assumed that traditional machine learning algorithms attempt to minimize empirical risk factors, and, as a result, the classification accuracy of the minority is often sacrificed. However, people are often interested in the minority. Various data-level methods, such as over- and under-sampling, and algorithm-level methods, such as ensemble, cost-sensitive, and one-class learning, have been proposed to improve classifier performance with an unbalanced data distribution. Based on such methods, we proposed a hybrid approach to deal with unbalanced data problem that comprises data preprocessing, clustering, data balancing, model building, and ensemble.
author2 Ting Ying Chien
author_facet Ting Ying Chien
Nai-Nan ZHANG
張乃楠
author Nai-Nan ZHANG
張乃楠
spellingShingle Nai-Nan ZHANG
張乃楠
Research on Unbalanced Data Classification Based on Hybrid Method
author_sort Nai-Nan ZHANG
title Research on Unbalanced Data Classification Based on Hybrid Method
title_short Research on Unbalanced Data Classification Based on Hybrid Method
title_full Research on Unbalanced Data Classification Based on Hybrid Method
title_fullStr Research on Unbalanced Data Classification Based on Hybrid Method
title_full_unstemmed Research on Unbalanced Data Classification Based on Hybrid Method
title_sort research on unbalanced data classification based on hybrid method
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/sy353q
work_keys_str_mv AT nainanzhang researchonunbalanceddataclassificationbasedonhybridmethod
AT zhāngnǎinán researchonunbalanceddataclassificationbasedonhybridmethod
AT nainanzhang jīyúhùnhéfāngfǎtànjiūbùpínghéngshùjùfēnlèiwèntí
AT zhāngnǎinán jīyúhùnhéfāngfǎtànjiūbùpínghéngshùjùfēnlèiwèntí
_version_ 1719287897159368704