Using mixture beta models to estimate the proportion of true null hypotheses in the multiple hypotheses testing

碩士 === 國立臺北大學 === 統計學系 === 98 === Microarrays can be used to detect the expression of thousands of genes. After the gene expression is measured, investigators may try to identify these genes for which there is differential expression across groups. Controlling the overall Type I error (α) is an impo...

Full description

Bibliographic Details
Main Authors: LIN, YU-HSING, 林育興
Other Authors: WANG, CHUN-CHAO
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/47321897616250445584
Description
Summary:碩士 === 國立臺北大學 === 統計學系 === 98 === Microarrays can be used to detect the expression of thousands of genes. After the gene expression is measured, investigators may try to identify these genes for which there is differential expression across groups. Controlling the overall Type I error (α) is an important issue in the multiple hypotheses testing. The family-wise error rate (FWER) and false discovery rate (FDR) are commonly used to difne the overall Type I error. Benjamini & Hochberg (1995) developed a FDR controlling procudure (BH procedure). However, when the number of true alternative hypotheses increse, the BH procedure becomes very conservative. Therefore, Benjamini & Hochberg (2000) proposed an adaptive FDR controlling procedure (Adaptive BH procedure) that incorporates the proportion of ture null hypotheses (π0) and the procedure is shown to have ability in controlling error. Nevertheless, π0 is unknown. Hence, how to estimate π0 is the pivotal issue. Allison et al. (2002) proposed using the mixture beta model to model the corresponding p values from the multiple hypotheses. Under the null hypothesis, the distribution of p values is uniform on the interval [0,1]. On the contrary, under the alternative hypothesis, the distribution of p values can be modeled as a mixture of V separate component beta distributions. Allison et al. (2002) model the p-values with a uniform distribution plus a beta distribution in there simulations. However, when the correlation between gene expression levels increases, the corresponding p values from the null hypotheses tend to cluster closer to one than to zero. That causes the improper use of the uniform distribution. This thesis suggests replacing the uniform distribution by a regular beta distribution in the mixed beta model. Monte Carlo simulations show that the model without uniform distribution has more robust and accurate performance in highly correlated situations. The estimation method proposed by Benjamini & Hochberg (2000) is also compared.