Optimality driven nearest centroid classification from genomic data.
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2007-10-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC1991588?pdf=render |
id |
doaj-a2673d0649844d58ab6cddd2be53e950 |
---|---|
record_format |
Article |
spelling |
doaj-a2673d0649844d58ab6cddd2be53e9502020-11-25T01:51:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032007-10-01210e100210.1371/journal.pone.0001002Optimality driven nearest centroid classification from genomic data.Alan R DabneyJohn D StoreyNearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.http://europepmc.org/articles/PMC1991588?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Alan R Dabney John D Storey |
spellingShingle |
Alan R Dabney John D Storey Optimality driven nearest centroid classification from genomic data. PLoS ONE |
author_facet |
Alan R Dabney John D Storey |
author_sort |
Alan R Dabney |
title |
Optimality driven nearest centroid classification from genomic data. |
title_short |
Optimality driven nearest centroid classification from genomic data. |
title_full |
Optimality driven nearest centroid classification from genomic data. |
title_fullStr |
Optimality driven nearest centroid classification from genomic data. |
title_full_unstemmed |
Optimality driven nearest centroid classification from genomic data. |
title_sort |
optimality driven nearest centroid classification from genomic data. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2007-10-01 |
description |
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers. |
url |
http://europepmc.org/articles/PMC1991588?pdf=render |
work_keys_str_mv |
AT alanrdabney optimalitydrivennearestcentroidclassificationfromgenomicdata AT johndstorey optimalitydrivennearestcentroidclassificationfromgenomicdata |
_version_ |
1724997858371829760 |