Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors...
Main Author: | |
---|---|
Other Authors: | |
Language: | en |
Published: |
Université d'Ottawa / University of Ottawa
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10393/33367 http://dx.doi.org/10.20381/ruor-3974 |
id |
ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-33367 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-333672018-01-05T19:02:30Z Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data Abbas Aghababazadeh, Farnoosh Bickel, David Alvo, Mayer Multiple Testing Local False Discovery Rtae Reference Class Bias-variance Trade Off Tuning Parameter Bootstrap Approach Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors. In large-scale hypothesis testing, researchers can use different statistical techniques such as family-wise error rates, false discovery rates, permutation methods, local false discovery rate, where all available data usually should be analyzed together. In applications, the thousands of tests are related by a scientifically meaningful structure. Ignoring that structure can be misleading as it may increase the number of false positives and false negatives. As an example, in genome-wide association studies each test corresponds to a specific genetic marker. In such a case, the scientific structure for each genetic marker can be its minor allele frequency. In this research, the local false discovery rate as a relevant statistical approach is considered to analyze the thousands of tests together. We present a model for multiple hypothesis testing when the scientific structure of each test is incorporated as a co-variate. The purpose of this model is to incorporate the co-variate to improve the performance of testing procedures. The method we consider has different estimates depending on the tuning parameter. We would like to estimate the optimal value of that parameter by considering observed statistics. Thus, among those estimators, the one which minimizes the estimated errors due to bias and to variance is chosen by applying the bootstrap approach. Such an estimation method is called an adaptive reference class method. Under the combined reference class method, the effect of the co-variates is ignored and all null hypotheses should be analyzed together. In this research, under some assumptions for the co-variates and the prior probabilities, the proposed adaptive reference class method shows smaller error than the combined reference class method in estimating the local false discovery rate, when the number of tests gets large. We describe the adaptive reference class method to the coronary artery disease data, and we use simulation data to evaluate the performance of the estimator associated with the adaptive reference class method. 2015-11-25T19:48:09Z 2015-11-25T19:48:09Z 2015 2015 Thesis http://hdl.handle.net/10393/33367 http://dx.doi.org/10.20381/ruor-3974 en Université d'Ottawa / University of Ottawa |
collection |
NDLTD |
language |
en |
sources |
NDLTD |
topic |
Multiple Testing Local False Discovery Rtae Reference Class Bias-variance Trade Off Tuning Parameter Bootstrap Approach |
spellingShingle |
Multiple Testing Local False Discovery Rtae Reference Class Bias-variance Trade Off Tuning Parameter Bootstrap Approach Abbas Aghababazadeh, Farnoosh Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data |
description |
Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors. In large-scale hypothesis testing, researchers can use different statistical techniques such as family-wise error rates, false discovery rates, permutation methods, local false discovery rate, where all available data usually should be analyzed together. In applications, the thousands of tests are related by a scientifically meaningful structure. Ignoring that structure can be misleading as it may increase the number of false positives and false negatives. As an example, in genome-wide association studies each test corresponds to a specific genetic marker. In such a case, the scientific structure for each genetic marker can be its minor allele frequency.
In this research, the local false discovery rate as a relevant statistical approach is considered to analyze the thousands of tests together. We present a model for multiple hypothesis testing when the scientific structure of each test is incorporated as a co-variate. The purpose of this model is to incorporate the co-variate to improve the performance of testing procedures. The method we consider has different estimates depending on the tuning parameter. We would like to estimate the optimal value of that parameter by considering observed statistics. Thus, among those estimators, the one which minimizes the estimated errors due to bias and to variance is chosen by applying the bootstrap approach. Such an estimation method is called an adaptive reference class method. Under the combined reference class method, the effect of the co-variates is ignored and all null hypotheses should be analyzed together.
In this research, under some assumptions for the co-variates and the prior probabilities, the proposed adaptive reference class method shows smaller error than the combined reference class method in estimating the local false discovery rate, when the number of tests gets large. We describe the adaptive reference class method to the coronary artery disease data, and we use simulation data to evaluate the performance of the estimator associated with the adaptive reference class method. |
author2 |
Bickel, David |
author_facet |
Bickel, David Abbas Aghababazadeh, Farnoosh |
author |
Abbas Aghababazadeh, Farnoosh |
author_sort |
Abbas Aghababazadeh, Farnoosh |
title |
Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data |
title_short |
Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data |
title_full |
Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data |
title_fullStr |
Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data |
title_full_unstemmed |
Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data |
title_sort |
estimating the local false discovery rate via a bootstrap solution to the reference class problem: application to genetic association data |
publisher |
Université d'Ottawa / University of Ottawa |
publishDate |
2015 |
url |
http://hdl.handle.net/10393/33367 http://dx.doi.org/10.20381/ruor-3974 |
work_keys_str_mv |
AT abbasaghababazadehfarnoosh estimatingthelocalfalsediscoveryrateviaabootstrapsolutiontothereferenceclassproblemapplicationtogeneticassociationdata |
_version_ |
1718598436584947712 |