Under-Sampling and Feature Selection Algorithms for S2SMLP

Imbalance learning is a hot topic in the data mining and machine learning domains. Data-level, algorithm-level and ensemble solutions are the three main methods proposed thus far to address imbalance learning. To alleviate the issues of data explosion and feature selection in multilayer perceptron b...

Full description

Bibliographic Details
Main Authors: Shudong Liu, Ke Zhang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9233461/
Description
Summary:Imbalance learning is a hot topic in the data mining and machine learning domains. Data-level, algorithm-level and ensemble solutions are the three main methods proposed thus far to address imbalance learning. To alleviate the issues of data explosion and feature selection in multilayer perceptron based on simultaneous two-sample representation(S2SMLP), in this paper, firstly, spectral clustering is exploited to select majority samples so as to construct a smaller training dataset for the classifier. We divide all majority samples into many clusters through spectral clustering, extract different numbers of representative samples from a cluster according to the size of each cluster, the average distance between the minority class and all samples of the cluster, then construct the training dataset of the classifier by combining these extracted samples from the majority class and all minority samples. Secondly, we propose a novel feature selection method based on the pairwise samples distance constraint, which considers the class labels of paired samples, select the features which push two similar samples closer together and pull two different samples farther apart. Finally, we conduct extensive experiments on 44 two-class imbalanced datasets and four high-dimensional DNA microarray datasets. The experimental results demonstrate that our proposed algorithms outperform some state-of-the-art algorithms in terms of $F\textrm {-measure},G\textrm {-mean}$ and AUC.
ISSN:2169-3536