A Decoupling and Bidirectional Resampling Method for Multilabel Classification of Imbalanced Data with Label Concurrence
Label imbalance is one of the characteristics of multilabel data, and imbalanced data seriously affects the performance of the classifiers. In multilabel classification, resampling methods are mostly used to deal with imbalanced problems. Existing resampling methods balance the data by either unders...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2020-01-01
|
Series: | Scientific Programming |
Online Access: | http://dx.doi.org/10.1155/2020/8829432 |
Summary: | Label imbalance is one of the characteristics of multilabel data, and imbalanced data seriously affects the performance of the classifiers. In multilabel classification, resampling methods are mostly used to deal with imbalanced problems. Existing resampling methods balance the data by either undersampling or oversampling, which causes overfitting and information loss. Resampling has a significant impact on the minority labels. Furthermore, the high concurrency of majority labels and minority labels in many instances also affects the performance of classification. In this study, we proposed a bidirectional resampling method to decouple multilabel datasets. On one hand, the concurrency of labels can be reduced by setting termination conditions for decoupling, and on the other hand, the loss of instance information and overfitting can be alleviated by combining oversampling and undersampling. By measuring the minority labels of the instances, the instances that have less impact on minority labels are selected to resample. The number of resampling is limited to keep the original distribution of the data during the resampling phase. The experiments on seven benchmark multilabel datasets have proved the effectiveness of the algorithm, especially on datasets with high concurrency of majority labels and minority labels. |
---|---|
ISSN: | 1058-9244 1875-919X |