Summary: | A large number of participants is often required by association studies investigating the causal mechanisms of complex diseases because of the generally weak causal effects involved in these conditions. The large sample sizes necessary for adequately powered analyses are mainly achieved by large studies. This can be an expensive undertaking and it is important that the correct sample size is identified. But, the analysis of the statistical power of large consortia and major biobanks demands that a number of complicating issues are taken into proper account. This includes the impact of unmeasured aetiological determinants and the quality of measurement of both outcome and explanatory variables. Conventional methods to analyse power use closed-form solutions that are not flexible enough to allow for these elements to be taken easily into account and this results in a potentially substantial overestimation of the actual power. In this thesis, I describe the radical rebuilding of an existing power calculator known as ESPRESSO to develop and implement the ESPRESSO-forte algorithm. ESPRESSO-forte is intended as a comprehensive study simulation platform aimed at supporting the design of large scale association studies and biobanks. I then applied the newly developed software to two real world scientific problems: (1) to assess the power of a large multi-provincial Canadian cohort for the study of quantitative traits; and (2) to estimate the impact of the particular standard operating procedures that were applied to the collecting and processing of biosamples in UK Biobank, on the likely power of future nested case-control studies. Some analyses now explore the role of copy-number variants (CNVs) in disease. I evaluated the accuracy of CNVs genotypes measured on four SNP genotyping platforms to inform future studies that plan to use existing SNP intensity data to measure CNVs or carry de novo CNV measurements from SNP genotyping platforms.
|