A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data

During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a rando...

Full description

Bibliographic Details
Main Authors: Kazuki Fujiwara, Maiko Shigeno, Ushio Sumita
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8737955/
id doaj-31781efc23864f5ca115cbdbcc274cc6
record_format Article
spelling doaj-31781efc23864f5ca115cbdbcc274cc62021-03-30T00:17:19ZengIEEEIEEE Access2169-35362019-01-017829708297710.1109/ACCESS.2019.29235248737955A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced DataKazuki Fujiwara0Maiko Shigeno1https://orcid.org/0000-0002-3671-9434Ushio Sumita2Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanGraduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanGraduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanDuring the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude.https://ieeexplore.ieee.org/document/8737955/Binary profile vectorsrank reduction approachrepetitive bagged under-samplingstrongly imbalanced data
collection DOAJ
language English
format Article
sources DOAJ
author Kazuki Fujiwara
Maiko Shigeno
Ushio Sumita
spellingShingle Kazuki Fujiwara
Maiko Shigeno
Ushio Sumita
A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
IEEE Access
Binary profile vectors
rank reduction approach
repetitive bagged under-sampling
strongly imbalanced data
author_facet Kazuki Fujiwara
Maiko Shigeno
Ushio Sumita
author_sort Kazuki Fujiwara
title A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_short A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_full A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_fullStr A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_full_unstemmed A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
title_sort new approach for developing segmentation algorithms for strongly imbalanced data
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude.
topic Binary profile vectors
rank reduction approach
repetitive bagged under-sampling
strongly imbalanced data
url https://ieeexplore.ieee.org/document/8737955/
work_keys_str_mv AT kazukifujiwara anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
AT maikoshigeno anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
AT ushiosumita anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
AT kazukifujiwara newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
AT maikoshigeno newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
AT ushiosumita newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata
_version_ 1724188424014921728