A New Under-Sampling Method to Face Class Overlap and Imbalance

Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches...

Full description

Bibliographic Details
Main Authors:	Angélica Guzmán-Ponce, Rosa María Valdovinos, José Salvador Sánchez, José Raymundo Marcial-Romero
Format:	Article
Language:	English
Published:	MDPI AG 2020-07-01
Series:	Applied Sciences
Subjects:	class imbalance class overlap under-sampling clustering DBSCAN minimum spanning tree
Online Access:	https://www.mdpi.com/2076-3417/10/15/5164

id	doaj-f6c76d46bf154a99bdd53ef1f740abfb
record_format	Article
spelling	doaj-f6c76d46bf154a99bdd53ef1f740abfb2020-11-25T03:37:39ZengMDPI AGApplied Sciences2076-34172020-07-01105164516410.3390/app10155164A New Under-Sampling Method to Face Class Overlap and ImbalanceAngélica Guzmán-Ponce0Rosa María Valdovinos1José Salvador Sánchez2José Raymundo Marcial-Romero3Facultad de Ingeniería, Universidad Autónoma del Estado de Mexico, Cerro de Coatepec s/n, Ciudad Universitaria, Toluca 50100, MexicoFacultad de Ingeniería, Universidad Autónoma del Estado de Mexico, Cerro de Coatepec s/n, Ciudad Universitaria, Toluca 50100, MexicoDepartment of Computer Languages and Systems, Institute of New Imaging Technologies, Universitat Jaume I, 12071 Castelló de la Plana, SpainFacultad de Ingeniería, Universidad Autónoma del Estado de Mexico, Cerro de Coatepec s/n, Ciudad Universitaria, Toluca 50100, MexicoClass overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.https://www.mdpi.com/2076-3417/10/15/5164class imbalanceclass overlapunder-samplingclusteringDBSCANminimum spanning tree
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Angélica Guzmán-Ponce Rosa María Valdovinos José Salvador Sánchez José Raymundo Marcial-Romero
spellingShingle	Angélica Guzmán-Ponce Rosa María Valdovinos José Salvador Sánchez José Raymundo Marcial-Romero A New Under-Sampling Method to Face Class Overlap and Imbalance Applied Sciences class imbalance class overlap under-sampling clustering DBSCAN minimum spanning tree
author_facet	Angélica Guzmán-Ponce Rosa María Valdovinos José Salvador Sánchez José Raymundo Marcial-Romero
author_sort	Angélica Guzmán-Ponce
title	A New Under-Sampling Method to Face Class Overlap and Imbalance
title_short	A New Under-Sampling Method to Face Class Overlap and Imbalance
title_full	A New Under-Sampling Method to Face Class Overlap and Imbalance
title_fullStr	A New Under-Sampling Method to Face Class Overlap and Imbalance
title_full_unstemmed	A New Under-Sampling Method to Face Class Overlap and Imbalance
title_sort	new under-sampling method to face class overlap and imbalance
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2020-07-01
description	Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
topic	class imbalance class overlap under-sampling clustering DBSCAN minimum spanning tree
url	https://www.mdpi.com/2076-3417/10/15/5164
work_keys_str_mv	AT angelicaguzmanponce anewundersamplingmethodtofaceclassoverlapandimbalance AT rosamariavaldovinos anewundersamplingmethodtofaceclassoverlapandimbalance AT josesalvadorsanchez anewundersamplingmethodtofaceclassoverlapandimbalance AT joseraymundomarcialromero anewundersamplingmethodtofaceclassoverlapandimbalance AT angelicaguzmanponce newundersamplingmethodtofaceclassoverlapandimbalance AT rosamariavaldovinos newundersamplingmethodtofaceclassoverlapandimbalance AT josesalvadorsanchez newundersamplingmethodtofaceclassoverlapandimbalance AT joseraymundomarcialromero newundersamplingmethodtofaceclassoverlapandimbalance
_version_	1724544719117090816

A New Under-Sampling Method to Face Class Overlap and Imbalance

Similar Items