A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification
Oversampling is an efficient technique in dealing with class-imbalance problem. It addresses the problem by reduplicating or generating the minority class samples to balance the distribution between the samples of the majority and the minority class. Synthetic minority oversampling technique (SMOTE)...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8642396/ |
id |
doaj-1a55d3970ca44b89aa96f6b730fbdfff |
---|---|
record_format |
Article |
spelling |
doaj-1a55d3970ca44b89aa96f6b730fbdfff2021-03-29T22:37:10ZengIEEEIEEE Access2169-35362019-01-017235372354810.1109/ACCESS.2019.28994678642396A Parameter-Free Cleaning Method for SMOTE in Imbalanced ClassificationYuanting Yan0https://orcid.org/0000-0001-6090-910XRuiqing Liu1Zihan Ding2Xiuquan Du3Jie Chen4Yanping Zhang5School of Computer Science and Technology, Anhui University, Hefei, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaOversampling is an efficient technique in dealing with class-imbalance problem. It addresses the problem by reduplicating or generating the minority class samples to balance the distribution between the samples of the majority and the minority class. Synthetic minority oversampling technique (SMOTE) is one of the typical representatives. During the past decade, researchers have proposed many variants of SMOTE. However, the existing oversampling methods may generate wrong minority class samples in some scenarios. Furthermore, how to effectively mine the inherent complex characteristics of imbalanced data remains a challenge. To this end, this paper proposes a parameter-free data cleaning method to improve SMOTE based on constructive covering algorithm. The dataset generated by SMOTE is first partitioned into a group of covers, then the hard-to-learn samples can be detected based on the characteristics of sample space distribution. Finally, a pair-wise deletion strategy is proposed to remove the hard-to-learn samples. The experimental results on 25 imbalanced datasets show that our proposed method is superior to the comparison methods in terms of various metrics, such as F-measure, G-mean, and Recall. Our method not only can reduce the complexity of the dataset but also can improve the performance of the classification model.https://ieeexplore.ieee.org/document/8642396/Imbalanced dataSMOTEoversamplingconstructive covering algorithmdata cleaning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yuanting Yan Ruiqing Liu Zihan Ding Xiuquan Du Jie Chen Yanping Zhang |
spellingShingle |
Yuanting Yan Ruiqing Liu Zihan Ding Xiuquan Du Jie Chen Yanping Zhang A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification IEEE Access Imbalanced data SMOTE oversampling constructive covering algorithm data cleaning |
author_facet |
Yuanting Yan Ruiqing Liu Zihan Ding Xiuquan Du Jie Chen Yanping Zhang |
author_sort |
Yuanting Yan |
title |
A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification |
title_short |
A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification |
title_full |
A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification |
title_fullStr |
A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification |
title_full_unstemmed |
A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification |
title_sort |
parameter-free cleaning method for smote in imbalanced classification |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Oversampling is an efficient technique in dealing with class-imbalance problem. It addresses the problem by reduplicating or generating the minority class samples to balance the distribution between the samples of the majority and the minority class. Synthetic minority oversampling technique (SMOTE) is one of the typical representatives. During the past decade, researchers have proposed many variants of SMOTE. However, the existing oversampling methods may generate wrong minority class samples in some scenarios. Furthermore, how to effectively mine the inherent complex characteristics of imbalanced data remains a challenge. To this end, this paper proposes a parameter-free data cleaning method to improve SMOTE based on constructive covering algorithm. The dataset generated by SMOTE is first partitioned into a group of covers, then the hard-to-learn samples can be detected based on the characteristics of sample space distribution. Finally, a pair-wise deletion strategy is proposed to remove the hard-to-learn samples. The experimental results on 25 imbalanced datasets show that our proposed method is superior to the comparison methods in terms of various metrics, such as F-measure, G-mean, and Recall. Our method not only can reduce the complexity of the dataset but also can improve the performance of the classification model. |
topic |
Imbalanced data SMOTE oversampling constructive covering algorithm data cleaning |
url |
https://ieeexplore.ieee.org/document/8642396/ |
work_keys_str_mv |
AT yuantingyan aparameterfreecleaningmethodforsmoteinimbalancedclassification AT ruiqingliu aparameterfreecleaningmethodforsmoteinimbalancedclassification AT zihanding aparameterfreecleaningmethodforsmoteinimbalancedclassification AT xiuquandu aparameterfreecleaningmethodforsmoteinimbalancedclassification AT jiechen aparameterfreecleaningmethodforsmoteinimbalancedclassification AT yanpingzhang aparameterfreecleaningmethodforsmoteinimbalancedclassification AT yuantingyan parameterfreecleaningmethodforsmoteinimbalancedclassification AT ruiqingliu parameterfreecleaningmethodforsmoteinimbalancedclassification AT zihanding parameterfreecleaningmethodforsmoteinimbalancedclassification AT xiuquandu parameterfreecleaningmethodforsmoteinimbalancedclassification AT jiechen parameterfreecleaningmethodforsmoteinimbalancedclassification AT yanpingzhang parameterfreecleaningmethodforsmoteinimbalancedclassification |
_version_ |
1724191119934226432 |