Optimality driven nearest centroid classification from genomic data.

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each...

Full description

Bibliographic Details
Main Authors: Alan R Dabney, John D Storey
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2007-10-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC1991588?pdf=render
id doaj-a2673d0649844d58ab6cddd2be53e950
record_format Article
spelling doaj-a2673d0649844d58ab6cddd2be53e9502020-11-25T01:51:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032007-10-01210e100210.1371/journal.pone.0001002Optimality driven nearest centroid classification from genomic data.Alan R DabneyJohn D StoreyNearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.http://europepmc.org/articles/PMC1991588?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Alan R Dabney
John D Storey
spellingShingle Alan R Dabney
John D Storey
Optimality driven nearest centroid classification from genomic data.
PLoS ONE
author_facet Alan R Dabney
John D Storey
author_sort Alan R Dabney
title Optimality driven nearest centroid classification from genomic data.
title_short Optimality driven nearest centroid classification from genomic data.
title_full Optimality driven nearest centroid classification from genomic data.
title_fullStr Optimality driven nearest centroid classification from genomic data.
title_full_unstemmed Optimality driven nearest centroid classification from genomic data.
title_sort optimality driven nearest centroid classification from genomic data.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2007-10-01
description Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.
url http://europepmc.org/articles/PMC1991588?pdf=render
work_keys_str_mv AT alanrdabney optimalitydrivennearestcentroidclassificationfromgenomicdata
AT johndstorey optimalitydrivennearestcentroidclassificationfromgenomicdata
_version_ 1724997858371829760