Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping
The class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Politeknik Negeri Padang
2021-03-01
|
Series: | JOIV: International Journal on Informatics Visualization |
Subjects: | |
Online Access: | http://joiv.org/index.php/joiv/article/view/420 |
id |
doaj-77e67ed2bd8c4e75b2c53232c3a6a988 |
---|---|
record_format |
Article |
spelling |
doaj-77e67ed2bd8c4e75b2c53232c3a6a9882021-03-31T05:26:54ZengPoliteknik Negeri PadangJOIV: International Journal on Informatics Visualization2549-96102549-99042021-03-0151222610.30630/joiv.5.1.420236Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and OverlappingHartono Hartono0Erianto Ongko1Department of Computer Science, Universitas IBBI, Medan, IndonesiaDepartment of Informatics, Akademi Teknologi Industri Immanuel, Medan, IndonesiaThe class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach that combines sampling and classifier ensembles. However, in terms of diversity among classifiers, a hybrid approach that combines sampling and classifier ensembles will give better results. HAR-MI provides excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of samples in the minority class. However, this SMOTE also has a weakness where an extremely imbalanced dataset and a large number of attributes will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sampling (MOSHS) method, and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based under-sampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Valuehttp://joiv.org/index.php/joiv/article/view/420class imbalancemulti-class datasetmulti-class imbalancehybrid approachhar-mi. |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hartono Hartono Erianto Ongko |
spellingShingle |
Hartono Hartono Erianto Ongko Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping JOIV: International Journal on Informatics Visualization class imbalance multi-class dataset multi-class imbalance hybrid approach har-mi. |
author_facet |
Hartono Hartono Erianto Ongko |
author_sort |
Hartono Hartono |
title |
Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping |
title_short |
Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping |
title_full |
Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping |
title_fullStr |
Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping |
title_full_unstemmed |
Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping |
title_sort |
combining hybrid approach redefinition-multiclass imbalance (har-mi) and hybrid sampling in handling multi-class imbalance and overlapping |
publisher |
Politeknik Negeri Padang |
series |
JOIV: International Journal on Informatics Visualization |
issn |
2549-9610 2549-9904 |
publishDate |
2021-03-01 |
description |
The class imbalance problem in the multi-class dataset is more challenging to manage than the problem in the two classes and this problem is more complicated if accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach that combines sampling and classifier ensembles. However, in terms of diversity among classifiers, a hybrid approach that combines sampling and classifier ensembles will give better results. HAR-MI provides excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of samples in the minority class. However, this SMOTE also has a weakness where an extremely imbalanced dataset and a large number of attributes will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sampling (MOSHS) method, and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based under-sampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Value |
topic |
class imbalance multi-class dataset multi-class imbalance hybrid approach har-mi. |
url |
http://joiv.org/index.php/joiv/article/view/420 |
work_keys_str_mv |
AT hartonohartono combininghybridapproachredefinitionmulticlassimbalanceharmiandhybridsamplinginhandlingmulticlassimbalanceandoverlapping AT eriantoongko combininghybridapproachredefinitionmulticlassimbalanceharmiandhybridsamplinginhandlingmulticlassimbalanceandoverlapping |
_version_ |
1724178494983766016 |