SIP-FS: a novel feature selection for data representation

Abstract Multiple features are widely used to characterize real-world datasets. It is desirable to select leading features with stability and interpretability from a set of distinct features for a comprehensive data description. However, most of existing feature selection methods focus on the predic...

Full description

Bibliographic Details
Main Authors: Yiyou Guo, Jinsheng Ji, Hong Huo, Tao Fang, Deren Li
Format: Article
Language:English
Published: SpringerOpen 2018-02-01
Series:EURASIP Journal on Image and Video Processing
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13640-018-0252-3
Description
Summary:Abstract Multiple features are widely used to characterize real-world datasets. It is desirable to select leading features with stability and interpretability from a set of distinct features for a comprehensive data description. However, most of existing feature selection methods focus on the predictability (e.g., prediction accuracy) of selected results yet neglect stability. To obtain compact data representation, a novel feature selection method is proposed to improve stability, and interpretability without sacrificing predictability (SIP-FS). Instead of mutual information, generalized correlation is adopted in minimal redundancy maximal relevance to measure the relation between different feature types. Several feature types (each contains a certain number of features) can then be selected and evaluated quantitatively to determine what types contribute to a specific class, thereby enhancing the so-called interpretability of features. Moreover, stability is introduced in the criterion of SIP-FS to obtain consistent results of ranking. We conduct experiments on three publicly available datasets using one-versus-all strategy to select class-specific features. The experiments illustrate that SIP-FS achieves significant performance improvements in terms of stability and interpretability with desirable prediction accuracy and indicates advantages over several state-of-the-art approaches.
ISSN:1687-5281