Consensus Clustering-Based Undersampling Approach to Imbalanced Learning

Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great i...

Full description

Bibliographic Details
Main Author:	Aytuğ Onan
Format:	Article
Language:	English
Published:	Hindawi Limited 2019-01-01
Series:	Scientific Programming
Online Access:	http://dx.doi.org/10.1155/2019/5901087

id	doaj-e4538631d9e044a4a5ff91567b78fbeb
record_format	Article
spelling	doaj-e4538631d9e044a4a5ff91567b78fbeb2021-07-02T10:28:37ZengHindawi LimitedScientific Programming1058-92441875-919X2019-01-01201910.1155/2019/59010875901087Consensus Clustering-Based Undersampling Approach to Imbalanced LearningAytuğ Onan0İzmir Katip Çelebi University, Faculty of Engineering and Architecture, Department of Computer Engineering, 35620 İzmir, TurkeyClass imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.http://dx.doi.org/10.1155/2019/5901087
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Aytuğ Onan
spellingShingle	Aytuğ Onan Consensus Clustering-Based Undersampling Approach to Imbalanced Learning Scientific Programming
author_facet	Aytuğ Onan
author_sort	Aytuğ Onan
title	Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
title_short	Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
title_full	Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
title_fullStr	Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
title_full_unstemmed	Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
title_sort	consensus clustering-based undersampling approach to imbalanced learning
publisher	Hindawi Limited
series	Scientific Programming
issn	1058-9244 1875-919X
publishDate	2019-01-01
description	Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
url	http://dx.doi.org/10.1155/2019/5901087
work_keys_str_mv	AT aytugonan consensusclusteringbasedundersamplingapproachtoimbalancedlearning
_version_	1721332058391838720

Consensus Clustering-Based Undersampling Approach to Imbalanced Learning

Similar Items