Summary: | <p>Abstract</p> <p>Background</p> <p>Many researchers use the double filtering procedure with fold change and <it>t </it>test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods.</p> <p>Results</p> <p>This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while <it>t </it>statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure.</p> <p>Conclusion</p> <p>We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure.</p>
|