A comparative review of estimates of the proportion unchanged genes and the false discovery rate

<p>Abstract</p> <p>Background</p> <p>In the analysis of microarray data one generally produces a vector of <it>p</it>-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these <i...

Full description

Bibliographic Details
Main Author: Broberg Per
Format: Article
Language:English
Published: BMC 2005-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/199
id doaj-18b08977546c47a29ac25343ad46ebb4
record_format Article
spelling doaj-18b08977546c47a29ac25343ad46ebb42020-11-25T02:47:36ZengBMCBMC Bioinformatics1471-21052005-08-016119910.1186/1471-2105-6-199A comparative review of estimates of the proportion unchanged genes and the false discovery rateBroberg Per<p>Abstract</p> <p>Background</p> <p>In the analysis of microarray data one generally produces a vector of <it>p</it>-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these <it>p</it>-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the <it>p</it>-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available.</p> <p>Results</p> <p>A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error.</p> <p>Conclusion</p> <p>The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information regarding false positive and negative rates as well as the proportion unchanged when identifying changed genes.</p> http://www.biomedcentral.com/1471-2105/6/199
collection DOAJ
language English
format Article
sources DOAJ
author Broberg Per
spellingShingle Broberg Per
A comparative review of estimates of the proportion unchanged genes and the false discovery rate
BMC Bioinformatics
author_facet Broberg Per
author_sort Broberg Per
title A comparative review of estimates of the proportion unchanged genes and the false discovery rate
title_short A comparative review of estimates of the proportion unchanged genes and the false discovery rate
title_full A comparative review of estimates of the proportion unchanged genes and the false discovery rate
title_fullStr A comparative review of estimates of the proportion unchanged genes and the false discovery rate
title_full_unstemmed A comparative review of estimates of the proportion unchanged genes and the false discovery rate
title_sort comparative review of estimates of the proportion unchanged genes and the false discovery rate
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-08-01
description <p>Abstract</p> <p>Background</p> <p>In the analysis of microarray data one generally produces a vector of <it>p</it>-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these <it>p</it>-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the <it>p</it>-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available.</p> <p>Results</p> <p>A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error.</p> <p>Conclusion</p> <p>The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information regarding false positive and negative rates as well as the proportion unchanged when identifying changed genes.</p>
url http://www.biomedcentral.com/1471-2105/6/199
work_keys_str_mv AT brobergper acomparativereviewofestimatesoftheproportionunchangedgenesandthefalsediscoveryrate
AT brobergper comparativereviewofestimatesoftheproportionunchangedgenesandthefalsediscoveryrate
_version_ 1724752675261644800