Modern simulation utilities for genetic analysis

Background: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited...

Full description

Bibliographic Details
Main Authors: German, C.A (Author), Ji, S.S (Author), Lange, K. (Author), Sinsheimer, J.S (Author), Sobel, E.M (Author), Zhou, H. (Author), Zhou, J. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 04471nam a2200709Ia 4500
001 10.1186-s12859-021-04086-8
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Modern simulation utilities for genetic analysis 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04086-8 
520 3 |a Background: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited to Gaussian traits or traits transformable to normality, while ignoring qualitative traits and realistic, non-normal trait distributions. Also, modern computer languages, such as Julia, that accommodate parallelization and cloud-based computing are now mainstream but rarely used in older applications. To meet the challenges of contemporary big studies, it is important for geneticists to adopt new computational tools. Results: We present TraitSimulation, an open-source Julia package that makes it trivial to quickly simulate phenotypes under a variety of genetic architectures. This package is integrated into our OpenMendel suite for easy downstream analyses. Julia was purpose-built for scientific programming and provides tremendous speed and memory efficiency, easy access to multi-CPU and GPU hardware, and to distributed and cloud-based parallelization. TraitSimulation is designed to encourage flexible trait simulation, including via the standard devices of applied statistics, generalized linear models (GLMs) and generalized linear mixed models (GLMMs). TraitSimulation also accommodates many study designs: unrelateds, sibships, pedigrees, or a mixture of all three. (Of course, for data with pedigrees or cryptic relationships, the simulation process must include the genetic dependencies among the individuals.) We consider an assortment of trait models and study designs to illustrate integrated simulation and analysis pipelines. Step-by-step instructions for these analyses are available in our electronic Jupyter notebooks on Github. These interactive notebooks are ideal for reproducible research. Conclusion: The TraitSimulation package has three main advantages. (1) It leverages the computational efficiency and ease of use of Julia to provide extremely fast, straightforward simulation of even the most complex genetic models, including GLMs and GLMMs. (2) It can be operated entirely within, but is not limited to, the integrated analysis pipeline of OpenMendel. And finally (3), by allowing a wider range of more realistic phenotype models, TraitSimulation brings power calculations and diagnostic tools closer to what investigators might see in real-world analyses. © 2021, The Author(s). 
650 0 4 |a adult 
650 0 4 |a aged 
650 0 4 |a Aged 
650 0 4 |a article 
650 0 4 |a calculation 
650 0 4 |a Cloud based computing 
650 0 4 |a cloud computing 
650 0 4 |a Cloud Computing 
650 0 4 |a Computational efficiency 
650 0 4 |a computer language 
650 0 4 |a computer simulation 
650 0 4 |a Computer Simulation 
650 0 4 |a Efficiency 
650 0 4 |a Generalized linear mixed models 
650 0 4 |a Generalized linear model 
650 0 4 |a genetic analysis 
650 0 4 |a Genetic architecture 
650 0 4 |a genetic model 
650 0 4 |a Genetic programming 
650 0 4 |a genetic screening 
650 0 4 |a Genetic Testing 
650 0 4 |a geneticist 
650 0 4 |a human 
650 0 4 |a Humans 
650 0 4 |a Integrated simulations 
650 0 4 |a memory 
650 0 4 |a Open source software 
650 0 4 |a pedigree 
650 0 4 |a Pedigree 
650 0 4 |a phenotype 
650 0 4 |a Phenotype 
650 0 4 |a pipeline 
650 0 4 |a Pipelines 
650 0 4 |a Power 
650 0 4 |a Realistic genetic models 
650 0 4 |a Reproducible research 
650 0 4 |a Scientific programming 
650 0 4 |a simulation 
650 0 4 |a Statistical genetics 
650 0 4 |a Step-by-step instructions 
650 0 4 |a Trait simulation 
650 0 4 |a velocity 
700 1 |a German, C.A.  |e author 
700 1 |a Ji, S.S.  |e author 
700 1 |a Lange, K.  |e author 
700 1 |a Sinsheimer, J.S.  |e author 
700 1 |a Sobel, E.M.  |e author 
700 1 |a Zhou, H.  |e author 
700 1 |a Zhou, J.  |e author 
773 |t BMC Bioinformatics