A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data

During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a rando...

Full description

Bibliographic Details
Main Authors:	Kazuki Fujiwara, Maiko Shigeno, Ushio Sumita
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Binary profile vectors rank reduction approach repetitive bagged under-sampling strongly imbalanced data
Online Access:	https://ieeexplore.ieee.org/document/8737955/

id	doaj-31781efc23864f5ca115cbdbcc274cc6
record_format	Article
spelling	doaj-31781efc23864f5ca115cbdbcc274cc62021-03-30T00:17:19ZengIEEEIEEE Access2169-35362019-01-017829708297710.1109/ACCESS.2019.29235248737955A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced DataKazuki Fujiwara0Maiko Shigeno1https://orcid.org/0000-0002-3671-9434Ushio Sumita2Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanGraduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanGraduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanDuring the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude.https://ieeexplore.ieee.org/document/8737955/Binary profile vectorsrank reduction approachrepetitive bagged under-samplingstrongly imbalanced data
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kazuki Fujiwara Maiko Shigeno Ushio Sumita
spellingShingle	Kazuki Fujiwara Maiko Shigeno Ushio Sumita A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data IEEE Access Binary profile vectors rank reduction approach repetitive bagged under-sampling strongly imbalanced data
author_facet	Kazuki Fujiwara Maiko Shigeno Ushio Sumita
author_sort	Kazuki Fujiwara
title	A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_short	A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_full	A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_fullStr	A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_full_unstemmed	A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_sort	new approach for developing segmentation algorithms for strongly imbalanced data
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude.
topic	Binary profile vectors rank reduction approach repetitive bagged under-sampling strongly imbalanced data
url	https://ieeexplore.ieee.org/document/8737955/
work_keys_str_mv	AT kazukifujiwara anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT maikoshigeno anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT ushiosumita anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT kazukifujiwara newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT maikoshigeno newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT ushiosumita newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
_version_	1724188424014921728

A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data

Similar Items