A Classification Method Based on Feature Selection for Imbalanced Data

Imbalanced data are very common in the real world, and it may deteriorate the performance of the conventional classification algorithms. In order to resolve the imbalanced classification problems, we propose an ensemble classification method that combines evolutionary under-sampling and feature sele...

Full description

Bibliographic Details
Main Authors: Yi Liu, Yanzhen Wang, Xiaoguang Ren, Hao Zhou, Xingchun Diao
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8740942/
Description
Summary:Imbalanced data are very common in the real world, and it may deteriorate the performance of the conventional classification algorithms. In order to resolve the imbalanced classification problems, we propose an ensemble classification method that combines evolutionary under-sampling and feature selection. We employ the Bootstrap method in original data to generate many sample subsets. V-statistic is developed to measure the distribution of imbalanced data, and it is also taken as the optimization objective of the genetic algorithm for the under-sampling sample subsets. Moreover, we take F<sub>1</sub> and Gmean indicators as two optimization objectives and employ the multiobjective ant colony optimization algorithm for feature selection of resampled data to construct an ensemble system. Ten low-dimensional and four high-dimensional typical imbalanced datasets are used in experiments. The six state-of-the-art algorithms and four measures are taken for a fair comparison. The experimental results show that our proposed system has a better classification performance compared with other algorithms, especially for the high-dimensional imbalanced data.
ISSN:2169-3536