Haplotype Inference through Sequential Monte Carlo

Technological advances in the last decade have given rise to large Genome Wide Studies which have helped researchers get better insights in the genetic basis of many common diseases. As the number of samples and genome coverage has increased dramatically it is currently typical that individuals are...

Full description

Bibliographic Details
Main Author: Iliadis, Alexandros
Language:English
Published: 2013
Subjects:
Online Access:https://doi.org/10.7916/D8SF33CZ
id ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8SF33CZ
record_format oai_dc
spelling ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8SF33CZ2019-05-09T15:14:10ZHaplotype Inference through Sequential Monte CarloIliadis, Alexandros2013ThesesBiometryEngineeringGeneticsTechnological advances in the last decade have given rise to large Genome Wide Studies which have helped researchers get better insights in the genetic basis of many common diseases. As the number of samples and genome coverage has increased dramatically it is currently typical that individuals are genotyped using high throughput platforms to more than 500,000 Single Nucleotide Polymorphisms. At the same time theoretical and empirical arguments have been made for the use of haplotypes, i.e. combinations of alleles at multiple loci in individual chromosomes, as opposed to genotypes so the problem of haplotype inference is particularly relevant. Existing haplotyping methods include population based methods, methods for pooled DNA samples and methods for family and pedigree data. Furthermore, the vast amount of available data pose new challenges for haplotyping algorithms. Candidate methods should scale well to the size of the datasets as the number of loci and the number of individuals are well to the thousands. In addition, as genotyping can be performed routinely, researchers encounter a number of specific new scenarios, which can be seen as hybrid between the population and pedigree inference scenarios and require special care to incorporate the maximum amount of information. In this thesis we present a Sequential Monte Carlo framework (TDS) and tailor it to address instances of haplotype inference and frequency estimation problems. Specifically, we first adjust our framework to perform haplotype inference in trio families resulting in a methodology that demonstrates an excellent tradeoff between speed and accuracy. Consequently, we extend our method to handle general nuclear families and demonstrate the gain using our approach as opposed to alternative scenarios. We further address the problem of haplotype inference in pooling data in which we show that our method achieves improved performance over existing approaches in datasets with large number of markers. We finally present a framework to handle the haplotype inference problem in regions of CNV/SNP data. Using our approach we can phase datasets where the ploidy of an individual can vary along the region and each individual can have different breakpoints.Englishhttps://doi.org/10.7916/D8SF33CZ
collection NDLTD
language English
sources NDLTD
topic Biometry
Engineering
Genetics
spellingShingle Biometry
Engineering
Genetics
Iliadis, Alexandros
Haplotype Inference through Sequential Monte Carlo
description Technological advances in the last decade have given rise to large Genome Wide Studies which have helped researchers get better insights in the genetic basis of many common diseases. As the number of samples and genome coverage has increased dramatically it is currently typical that individuals are genotyped using high throughput platforms to more than 500,000 Single Nucleotide Polymorphisms. At the same time theoretical and empirical arguments have been made for the use of haplotypes, i.e. combinations of alleles at multiple loci in individual chromosomes, as opposed to genotypes so the problem of haplotype inference is particularly relevant. Existing haplotyping methods include population based methods, methods for pooled DNA samples and methods for family and pedigree data. Furthermore, the vast amount of available data pose new challenges for haplotyping algorithms. Candidate methods should scale well to the size of the datasets as the number of loci and the number of individuals are well to the thousands. In addition, as genotyping can be performed routinely, researchers encounter a number of specific new scenarios, which can be seen as hybrid between the population and pedigree inference scenarios and require special care to incorporate the maximum amount of information. In this thesis we present a Sequential Monte Carlo framework (TDS) and tailor it to address instances of haplotype inference and frequency estimation problems. Specifically, we first adjust our framework to perform haplotype inference in trio families resulting in a methodology that demonstrates an excellent tradeoff between speed and accuracy. Consequently, we extend our method to handle general nuclear families and demonstrate the gain using our approach as opposed to alternative scenarios. We further address the problem of haplotype inference in pooling data in which we show that our method achieves improved performance over existing approaches in datasets with large number of markers. We finally present a framework to handle the haplotype inference problem in regions of CNV/SNP data. Using our approach we can phase datasets where the ploidy of an individual can vary along the region and each individual can have different breakpoints.
author Iliadis, Alexandros
author_facet Iliadis, Alexandros
author_sort Iliadis, Alexandros
title Haplotype Inference through Sequential Monte Carlo
title_short Haplotype Inference through Sequential Monte Carlo
title_full Haplotype Inference through Sequential Monte Carlo
title_fullStr Haplotype Inference through Sequential Monte Carlo
title_full_unstemmed Haplotype Inference through Sequential Monte Carlo
title_sort haplotype inference through sequential monte carlo
publishDate 2013
url https://doi.org/10.7916/D8SF33CZ
work_keys_str_mv AT iliadisalexandros haplotypeinferencethroughsequentialmontecarlo
_version_ 1719046071597924352