Identifying genes that contribute most to good classification in microarrays

<p>Abstract</p> <p>Background</p> <p>The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene...

Full description

Bibliographic Details
Main Authors: Kramer Barnett S, Baker Stuart G
Format: Article
Language:English
Published: BMC 2006-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/407
id doaj-da862cf719194153826b4e40959158f4
record_format Article
spelling doaj-da862cf719194153826b4e40959158f42020-11-24T21:00:31ZengBMCBMC Bioinformatics1471-21052006-09-017140710.1186/1471-2105-7-407Identifying genes that contribute most to good classification in microarraysKramer Barnett SBaker Stuart G<p>Abstract</p> <p>Background</p> <p>The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples).</p> <p>Results</p> <p>We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia.</p> <p>Conclusion</p> <p>Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.</p> http://www.biomedcentral.com/1471-2105/7/407
collection DOAJ
language English
format Article
sources DOAJ
author Kramer Barnett S
Baker Stuart G
spellingShingle Kramer Barnett S
Baker Stuart G
Identifying genes that contribute most to good classification in microarrays
BMC Bioinformatics
author_facet Kramer Barnett S
Baker Stuart G
author_sort Kramer Barnett S
title Identifying genes that contribute most to good classification in microarrays
title_short Identifying genes that contribute most to good classification in microarrays
title_full Identifying genes that contribute most to good classification in microarrays
title_fullStr Identifying genes that contribute most to good classification in microarrays
title_full_unstemmed Identifying genes that contribute most to good classification in microarrays
title_sort identifying genes that contribute most to good classification in microarrays
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-09-01
description <p>Abstract</p> <p>Background</p> <p>The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples).</p> <p>Results</p> <p>We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia.</p> <p>Conclusion</p> <p>Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.</p>
url http://www.biomedcentral.com/1471-2105/7/407
work_keys_str_mv AT kramerbarnetts identifyinggenesthatcontributemosttogoodclassificationinmicroarrays
AT bakerstuartg identifyinggenesthatcontributemosttogoodclassificationinmicroarrays
_version_ 1716779486191550464