Statistical Methodology for Sequence Analysis
Rare disease variants are receiving increasing importance in the past few years as the potential cause for many complex diseases, after the common disease variants failed to explain a large part of the missing heritability. With the advancement in sequencing techniques as well as computational capab...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
Harvard University
2012
|
Subjects: | |
Online Access: | http://dissertations.umi.com/gsas.harvard:10178 http://nrs.harvard.edu/urn-3:HUL.InstRepos:9288545 |
id |
ndltd-harvard.edu-oai-dash.harvard.edu-1-9288545 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-harvard.edu-oai-dash.harvard.edu-1-92885452015-08-14T15:41:31ZStatistical Methodology for Sequence AnalysisAdhikari, KaustubhbiostatisticsBayesian modelingcommon variantsgenetic associationrare variantsstatistical methodologyRare disease variants are receiving increasing importance in the past few years as the potential cause for many complex diseases, after the common disease variants failed to explain a large part of the missing heritability. With the advancement in sequencing techniques as well as computational capabilities, statistical methodology for analyzing rare variants is now a hot topic, especially in case-control association studies. In this thesis, we initially present two related statistical methodologies designed for case-control studies to predict the number of common and rare variants in a particular genomic region underlying the complex disease. Genome-wide association studies are nowadays routinely performed to identify a few putative marker loci or a candidate region for further analysis. These methods are designed to work with SNP data on such a genomic region highlighted by GWAS studies for potential disease variants. The fundamental idea is to use Bayesian methodology to obtain bivariate posterior distributions on counts of common and rare variants. While the first method uses randomly generated (minimal) ancestral recombination graphs, the second method uses ensemble clustering method to explore the space of genealogical trees that represent the inherent structure in the test subjects. In contrast to the aforesaid methods which work with SNP data, the third chapter deals with next-generation sequencing data to detect the presence of rare variants in a genomic region. We present a non-parametric statistical methodology for rare variant association testing, using the well-known Kolmogorov-Smirnov framework adapted for genetic data. it is a fast, model-free robust statistic, designed for situations where both deleterious and protective variants are present. It is also unique in utilizing the variant locations in the test statistic.Lange, Christoph2012-07-24T13:15:02Z2012-07-2420122012-07-24T13:15:02ZThesis or DissertationAdhikari, Kaustubh. 2012. Statistical Methodology for Sequence Analysis. Doctoral dissertation, Harvard University.http://dissertations.umi.com/gsas.harvard:10178http://nrs.harvard.edu/urn-3:HUL.InstRepos:9288545en_USopenhttp://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAAHarvard University |
collection |
NDLTD |
language |
en_US |
sources |
NDLTD |
topic |
biostatistics Bayesian modeling common variants genetic association rare variants statistical methodology |
spellingShingle |
biostatistics Bayesian modeling common variants genetic association rare variants statistical methodology Adhikari, Kaustubh Statistical Methodology for Sequence Analysis |
description |
Rare disease variants are receiving increasing importance in the past few years as the potential cause for many complex diseases, after the common disease variants failed to explain a large part of the missing heritability. With the advancement in sequencing techniques as well as computational capabilities, statistical methodology for analyzing rare variants is now a hot topic, especially in case-control association studies. In this thesis, we initially present two related statistical methodologies designed for case-control studies to predict the number of common and rare variants in a particular genomic region underlying the complex disease. Genome-wide association studies are nowadays routinely performed to identify a few putative marker loci or a candidate region for further analysis. These methods are designed to work with SNP data on such a genomic region highlighted by GWAS studies for potential disease variants. The fundamental idea is to use Bayesian methodology to obtain bivariate posterior distributions on counts of common and rare variants. While the first method uses randomly generated (minimal) ancestral recombination graphs, the second method uses ensemble clustering method to explore the space of genealogical trees that represent the inherent structure in the test subjects. In contrast to the aforesaid methods which work with SNP data, the third chapter deals with next-generation sequencing data to detect the presence of rare variants in a genomic region. We present a non-parametric statistical methodology for rare variant association testing, using the well-known Kolmogorov-Smirnov framework adapted for genetic data. it is a fast, model-free robust statistic, designed for situations where both deleterious and protective variants are present. It is also unique in utilizing the variant locations in the test statistic. |
author2 |
Lange, Christoph |
author_facet |
Lange, Christoph Adhikari, Kaustubh |
author |
Adhikari, Kaustubh |
author_sort |
Adhikari, Kaustubh |
title |
Statistical Methodology for Sequence Analysis |
title_short |
Statistical Methodology for Sequence Analysis |
title_full |
Statistical Methodology for Sequence Analysis |
title_fullStr |
Statistical Methodology for Sequence Analysis |
title_full_unstemmed |
Statistical Methodology for Sequence Analysis |
title_sort |
statistical methodology for sequence analysis |
publisher |
Harvard University |
publishDate |
2012 |
url |
http://dissertations.umi.com/gsas.harvard:10178 http://nrs.harvard.edu/urn-3:HUL.InstRepos:9288545 |
work_keys_str_mv |
AT adhikarikaustubh statisticalmethodologyforsequenceanalysis |
_version_ |
1716816287116558336 |