Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data

Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors...

Full description

Bibliographic Details
Main Author: Abbas Aghababazadeh, Farnoosh
Other Authors: Bickel, David
Language:en
Published: Université d'Ottawa / University of Ottawa 2015
Subjects:
Online Access:http://hdl.handle.net/10393/33367
http://dx.doi.org/10.20381/ruor-3974
id ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-33367
record_format oai_dc
spelling ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-333672018-01-05T19:02:30Z Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data Abbas Aghababazadeh, Farnoosh Bickel, David Alvo, Mayer Multiple Testing Local False Discovery Rtae Reference Class Bias-variance Trade Off Tuning Parameter Bootstrap Approach Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors. In large-scale hypothesis testing, researchers can use different statistical techniques such as family-wise error rates, false discovery rates, permutation methods, local false discovery rate, where all available data usually should be analyzed together. In applications, the thousands of tests are related by a scientifically meaningful structure. Ignoring that structure can be misleading as it may increase the number of false positives and false negatives. As an example, in genome-wide association studies each test corresponds to a specific genetic marker. In such a case, the scientific structure for each genetic marker can be its minor allele frequency. In this research, the local false discovery rate as a relevant statistical approach is considered to analyze the thousands of tests together. We present a model for multiple hypothesis testing when the scientific structure of each test is incorporated as a co-variate. The purpose of this model is to incorporate the co-variate to improve the performance of testing procedures. The method we consider has different estimates depending on the tuning parameter. We would like to estimate the optimal value of that parameter by considering observed statistics. Thus, among those estimators, the one which minimizes the estimated errors due to bias and to variance is chosen by applying the bootstrap approach. Such an estimation method is called an adaptive reference class method. Under the combined reference class method, the effect of the co-variates is ignored and all null hypotheses should be analyzed together. In this research, under some assumptions for the co-variates and the prior probabilities, the proposed adaptive reference class method shows smaller error than the combined reference class method in estimating the local false discovery rate, when the number of tests gets large. We describe the adaptive reference class method to the coronary artery disease data, and we use simulation data to evaluate the performance of the estimator associated with the adaptive reference class method. 2015-11-25T19:48:09Z 2015-11-25T19:48:09Z 2015 2015 Thesis http://hdl.handle.net/10393/33367 http://dx.doi.org/10.20381/ruor-3974 en Université d'Ottawa / University of Ottawa
collection NDLTD
language en
sources NDLTD
topic Multiple Testing
Local False Discovery Rtae
Reference Class
Bias-variance Trade Off
Tuning Parameter
Bootstrap Approach
spellingShingle Multiple Testing
Local False Discovery Rtae
Reference Class
Bias-variance Trade Off
Tuning Parameter
Bootstrap Approach
Abbas Aghababazadeh, Farnoosh
Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
description Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors. In large-scale hypothesis testing, researchers can use different statistical techniques such as family-wise error rates, false discovery rates, permutation methods, local false discovery rate, where all available data usually should be analyzed together. In applications, the thousands of tests are related by a scientifically meaningful structure. Ignoring that structure can be misleading as it may increase the number of false positives and false negatives. As an example, in genome-wide association studies each test corresponds to a specific genetic marker. In such a case, the scientific structure for each genetic marker can be its minor allele frequency. In this research, the local false discovery rate as a relevant statistical approach is considered to analyze the thousands of tests together. We present a model for multiple hypothesis testing when the scientific structure of each test is incorporated as a co-variate. The purpose of this model is to incorporate the co-variate to improve the performance of testing procedures. The method we consider has different estimates depending on the tuning parameter. We would like to estimate the optimal value of that parameter by considering observed statistics. Thus, among those estimators, the one which minimizes the estimated errors due to bias and to variance is chosen by applying the bootstrap approach. Such an estimation method is called an adaptive reference class method. Under the combined reference class method, the effect of the co-variates is ignored and all null hypotheses should be analyzed together. In this research, under some assumptions for the co-variates and the prior probabilities, the proposed adaptive reference class method shows smaller error than the combined reference class method in estimating the local false discovery rate, when the number of tests gets large. We describe the adaptive reference class method to the coronary artery disease data, and we use simulation data to evaluate the performance of the estimator associated with the adaptive reference class method.
author2 Bickel, David
author_facet Bickel, David
Abbas Aghababazadeh, Farnoosh
author Abbas Aghababazadeh, Farnoosh
author_sort Abbas Aghababazadeh, Farnoosh
title Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
title_short Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
title_full Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
title_fullStr Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
title_full_unstemmed Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data
title_sort estimating the local false discovery rate via a bootstrap solution to the reference class problem: application to genetic association data
publisher Université d'Ottawa / University of Ottawa
publishDate 2015
url http://hdl.handle.net/10393/33367
http://dx.doi.org/10.20381/ruor-3974
work_keys_str_mv AT abbasaghababazadehfarnoosh estimatingthelocalfalsediscoveryrateviaabootstrapsolutiontothereferenceclassproblemapplicationtogeneticassociationdata
_version_ 1718598436584947712