Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures

Abstract Background When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have b...

Full description

Bibliographic Details
Main Authors:	Lu Xin, Perkins David L
Format:	Article
Language:	English
Published:	BMC 2007-05-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/157

id	doaj-f72971c91247490b9147ab100fd9fcba
record_format	Article
spelling	doaj-f72971c91247490b9147ab100fd9fcba2020-11-25T00:22:19ZengBMCBMC Bioinformatics1471-21052007-05-018115710.1186/1471-2105-8-157Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structuresLu XinPerkins David L<p>Abstract</p> <p>Background</p> <p>When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses.</p> <p>Results</p> <p>We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR.</p> <p>Conclusion</p> <p>With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.</p> http://www.biomedcentral.com/1471-2105/8/157
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Lu Xin Perkins David L
spellingShingle	Lu Xin Perkins David L Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures BMC Bioinformatics
author_facet	Lu Xin Perkins David L
author_sort	Lu Xin
title	Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures
title_short	Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures
title_full	Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures
title_fullStr	Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures
title_full_unstemmed	Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures
title_sort	re-sampling strategy to improve the estimation of number of null hypotheses in fdr control under strong correlation structures
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2007-05-01
description	<p>Abstract</p> <p>Background</p> <p>When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses.</p> <p>Results</p> <p>We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR.</p> <p>Conclusion</p> <p>With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.</p>
url	http://www.biomedcentral.com/1471-2105/8/157
work_keys_str_mv	AT luxin resamplingstrategytoimprovetheestimationofnumberofnullhypothesesinfdrcontrolunderstrongcorrelationstructures AT perkinsdavidl resamplingstrategytoimprovetheestimationofnumberofnullhypothesesinfdrcontrolunderstrongcorrelationstructures
_version_	1725360441378471936

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures

Similar Items