An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data
Background We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to cons...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2006-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.1177/117693510600200016 |
id |
doaj-be312ef94848470f9350e788da05b135 |
---|---|
record_format |
Article |
spelling |
doaj-be312ef94848470f9350e788da05b1352020-11-25T03:24:38ZengSAGE PublishingCancer Informatics1176-93512006-01-01210.1177/117693510600200016An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray DataMichael Lecocke0Kenneth Hess1Department of Epidemiology and Biostatistics, University of Texas Health Science Center, San Antonio, Texas 78229, U.S.A.Department of Biostatistics and Applied Mathematics, UT MD Anderson Cancer Center, Houston, Texas 77030, U.S.A.Background We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data). Methods We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets. Results Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA-, and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results. Conclusions We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches.https://doi.org/10.1177/117693510600200016 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Michael Lecocke Kenneth Hess |
spellingShingle |
Michael Lecocke Kenneth Hess An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data Cancer Informatics |
author_facet |
Michael Lecocke Kenneth Hess |
author_sort |
Michael Lecocke |
title |
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data |
title_short |
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data |
title_full |
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data |
title_fullStr |
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data |
title_full_unstemmed |
An Empirical Study of Univariate and Genetic Algorithm-Based Feature Selection in Binary Classification with Microarray Data |
title_sort |
empirical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2006-01-01 |
description |
Background We consider both univariate- and multivariate-based feature selection for the problem of binary classification with microarray data. The idea is to determine whether the more sophisticated multivariate approach leads to better misclassification error rates because of the potential to consider jointly significant subsets of genes (but without overfitting the data). Methods We present an empirical study in which 10-fold cross-validation is applied externally to both a univariate-based and two multivariate- (genetic algorithm (GA)-) based feature selection processes. These procedures are applied with respect to three supervised learning algorithms and six published two-class microarray datasets. Results Considering all datasets, and learning algorithms, the average 10-fold external cross-validation error rates for the univariate-, single-stage GA-, and two-stage GA-based processes are 14.2%, 14.6%, and 14.2%, respectively. We also find that the optimism bias estimates from the GA analyses were half that of the univariate approach, but the selection bias estimates from the GA analyses were 2.5 times that of the univariate results. Conclusions We find that the 10-fold external cross-validation misclassification error rates were very comparable. Further, we find that a two-stage GA approach did not demonstrate a significant advantage over a 1-stage approach. We also find that the univariate approach had higher optimism bias and lower selection bias compared to both GA approaches. |
url |
https://doi.org/10.1177/117693510600200016 |
work_keys_str_mv |
AT michaellecocke anempiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata AT kennethhess anempiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata AT michaellecocke empiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata AT kennethhess empiricalstudyofunivariateandgeneticalgorithmbasedfeatureselectioninbinaryclassificationwithmicroarraydata |
_version_ |
1724600897146716160 |