A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data
During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a rando...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8737955/ |
id |
doaj-31781efc23864f5ca115cbdbcc274cc6 |
---|---|
record_format |
Article |
spelling |
doaj-31781efc23864f5ca115cbdbcc274cc62021-03-30T00:17:19ZengIEEEIEEE Access2169-35362019-01-017829708297710.1109/ACCESS.2019.29235248737955A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced DataKazuki Fujiwara0Maiko Shigeno1https://orcid.org/0000-0002-3671-9434Ushio Sumita2Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanGraduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanGraduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, JapanDuring the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude.https://ieeexplore.ieee.org/document/8737955/Binary profile vectorsrank reduction approachrepetitive bagged under-samplingstrongly imbalanced data |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kazuki Fujiwara Maiko Shigeno Ushio Sumita |
spellingShingle |
Kazuki Fujiwara Maiko Shigeno Ushio Sumita A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data IEEE Access Binary profile vectors rank reduction approach repetitive bagged under-sampling strongly imbalanced data |
author_facet |
Kazuki Fujiwara Maiko Shigeno Ushio Sumita |
author_sort |
Kazuki Fujiwara |
title |
A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data |
title_short |
A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data |
title_full |
A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data |
title_fullStr |
A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data |
title_full_unstemmed |
A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data |
title_sort |
new approach for developing segmentation algorithms for strongly imbalanced data |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude. |
topic |
Binary profile vectors rank reduction approach repetitive bagged under-sampling strongly imbalanced data |
url |
https://ieeexplore.ieee.org/document/8737955/ |
work_keys_str_mv |
AT kazukifujiwara anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT maikoshigeno anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT ushiosumita anewapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT kazukifujiwara newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT maikoshigeno newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata AT ushiosumita newapproachfordevelopingsegmentationalgorithmsforstronglyimbalanceddata |
_version_ |
1724188424014921728 |