Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies

Background: Supervised classification methods have been used for many years for feature selection in metabolomics and other omics studies. We developed a novel primal-dual based classification method (PD-CR) that can perform classification with rejection and feature selection on high dimensional dat...

Full description

Bibliographic Details
Main Authors:	Bailleux, C. (Author), Barlaud, M. (Author), Burel-Vandenbos, F. (Author), Chardin, D. (Author), Humbert, O. (Author), Pourcher, T. (Author), Rigau, V. (Author)
Format:	Article
Language:	English
Published:	BioMed Central Ltd 2021
Subjects:	Classification methods Confidence score Constrained optimization Decision trees discriminant analysis Discriminant analysis Discriminant Analysis False discovery rate Feature extraction Features selection human Humans Isocitrate dehydrogenase least square analysis Least squares approximations Least-Squares Analysis Metabolites metabolomics Metabolomics Partial least squares discriminant analyses (PLSDA) Primal-dual Random forests support vector machine Support Vector Machine Support vector machines Support vectors machine
Online Access:	View Fulltext in Publisher

Description
Summary:	Background: Supervised classification methods have been used for many years for feature selection in metabolomics and other omics studies. We developed a novel primal-dual based classification method (PD-CR) that can perform classification with rejection and feature selection on high dimensional datasets. PD-CR projects data onto a low dimension space and performs classification by minimizing an appropriate quadratic cost. It simultaneously optimizes the selected features and the prediction accuracy with a new tailored, constrained primal-dual method. The primal-dual framework is general enough to encompass various robust losses and to allow for convergence analysis. Here, we compare PD-CR to three commonly used methods: partial least squares discriminant analysis (PLS-DA), random forests and support vector machines (SVM). We analyzed two metabolomics datasets: one urinary metabolomics dataset concerning lung cancer patients and healthy controls; and a metabolomics dataset obtained from frozen glial tumor samples with mutated isocitrate dehydrogenase (IDH) or wild-type IDH. Results: PD-CR was more accurate than PLS-DA, Random Forests and SVM for classification using the 2 metabolomics datasets. It also selected biologically relevant metabolites. PD-CR has the advantage of providing a confidence score for each prediction, which can be used to perform classification with rejection. This substantially reduces the False Discovery Rate. Conclusion: PD-CR is an accurate method for classification of metabolomics datasets which can outperform PLS-DA, Random Forests and SVM while selecting biologically relevant features. Furthermore the confidence score provided with PD-CR can be used to perform classification with rejection and reduce the false discovery rate. © 2021, The Author(s).
ISBN:	14712105 (ISSN)
DOI:	10.1186/s12859-021-04478-w

Primal-dual for classification with rejection (PD-CR): a novel method for classification and feature selection—an application in metabolomics studies

Similar Items