Efficient algorithms for polyploid haplotype phasing

Abstract Background Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequenc...

Full description

Bibliographic Details
Main Authors: Dan He, Subrata Saha, Richard Finkers, Laxmi Parida
Format: Article
Language:English
Published: BMC 2018-05-01
Series:BMC Genomics
Online Access:http://link.springer.com/article/10.1186/s12864-018-4464-9
id doaj-df81ce8052a24f9cb33675b2b2cdcbbc
record_format Article
spelling doaj-df81ce8052a24f9cb33675b2b2cdcbbc2020-11-24T21:22:13ZengBMCBMC Genomics1471-21642018-05-0119S217118010.1186/s12864-018-4464-9Efficient algorithms for polyploid haplotype phasingDan He0Subrata Saha1Richard Finkers2Laxmi Parida3College of Computer Science and Software, Shenzhen UniversityIBM T.J. Watson Research CenterWageningen University & ResearchIBM T.J. Watson Research CenterAbstract Background Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more. Results We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes. Conclusions Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.http://link.springer.com/article/10.1186/s12864-018-4464-9
collection DOAJ
language English
format Article
sources DOAJ
author Dan He
Subrata Saha
Richard Finkers
Laxmi Parida
spellingShingle Dan He
Subrata Saha
Richard Finkers
Laxmi Parida
Efficient algorithms for polyploid haplotype phasing
BMC Genomics
author_facet Dan He
Subrata Saha
Richard Finkers
Laxmi Parida
author_sort Dan He
title Efficient algorithms for polyploid haplotype phasing
title_short Efficient algorithms for polyploid haplotype phasing
title_full Efficient algorithms for polyploid haplotype phasing
title_fullStr Efficient algorithms for polyploid haplotype phasing
title_full_unstemmed Efficient algorithms for polyploid haplotype phasing
title_sort efficient algorithms for polyploid haplotype phasing
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2018-05-01
description Abstract Background Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more. Results We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes. Conclusions Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.
url http://link.springer.com/article/10.1186/s12864-018-4464-9
work_keys_str_mv AT danhe efficientalgorithmsforpolyploidhaplotypephasing
AT subratasaha efficientalgorithmsforpolyploidhaplotypephasing
AT richardfinkers efficientalgorithmsforpolyploidhaplotypephasing
AT laxmiparida efficientalgorithmsforpolyploidhaplotypephasing
_version_ 1725996918941679616