Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping

The class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass...

Full description

Bibliographic Details
Main Authors: Hartono Hartono, Erianto Ongko
Format: Article
Language:English
Published: Politeknik Negeri Padang 2021-03-01
Series:JOIV: International Journal on Informatics Visualization
Subjects:
Online Access:http://joiv.org/index.php/joiv/article/view/420
id doaj-77e67ed2bd8c4e75b2c53232c3a6a988
record_format Article
spelling doaj-77e67ed2bd8c4e75b2c53232c3a6a9882021-03-31T05:26:54ZengPoliteknik Negeri PadangJOIV: International Journal on Informatics Visualization2549-96102549-99042021-03-0151222610.30630/joiv.5.1.420236Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and OverlappingHartono Hartono0Erianto Ongko1Department of Computer Science, Universitas IBBI, Medan, IndonesiaDepartment of Informatics, Akademi Teknologi Industri Immanuel, Medan, IndonesiaThe class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach that combines sampling and classifier ensembles. However, in terms of diversity among classifiers, a hybrid approach that combines sampling and classifier ensembles will give better results. HAR-MI provides excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of samples in the minority class. However, this SMOTE also has a weakness where an extremely imbalanced dataset and a large number of attributes will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sampling (MOSHS) method, and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based under-sampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Valuehttp://joiv.org/index.php/joiv/article/view/420class imbalancemulti-class datasetmulti-class imbalancehybrid approachhar-mi.
collection DOAJ
language English
format Article
sources DOAJ
author Hartono Hartono
Erianto Ongko
spellingShingle Hartono Hartono
Erianto Ongko
Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
JOIV: International Journal on Informatics Visualization
class imbalance
multi-class dataset
multi-class imbalance
hybrid approach
har-mi.
author_facet Hartono Hartono
Erianto Ongko
author_sort Hartono Hartono
title Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
title_short Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
title_full Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
title_fullStr Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
title_full_unstemmed Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
title_sort combining hybrid approach redefinition-multiclass imbalance (har-mi) and hybrid sampling in handling multi-class imbalance and overlapping
publisher Politeknik Negeri Padang
series JOIV: International Journal on Informatics Visualization
issn 2549-9610
2549-9904
publishDate 2021-03-01
description The class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach that combines sampling and classifier ensembles. However, in terms of diversity among classifiers, a hybrid approach that combines sampling and classifier ensembles will give better results. HAR-MI provides excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of samples in the minority class. However, this SMOTE also has a weakness where an extremely imbalanced dataset and a large number of attributes will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sampling (MOSHS) method, and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based under-sampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Value
topic class imbalance
multi-class dataset
multi-class imbalance
hybrid approach
har-mi.
url http://joiv.org/index.php/joiv/article/view/420
work_keys_str_mv AT hartonohartono combininghybridapproachredefinitionmulticlassimbalanceharmiandhybridsamplinginhandlingmulticlassimbalanceandoverlapping
AT eriantoongko combininghybridapproachredefinitionmulticlassimbalanceharmiandhybridsamplinginhandlingmulticlassimbalanceandoverlapping
_version_ 1724178494983766016