PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.

Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL...

Full description

Bibliographic Details
Main Authors: Oren E Livne, Lide Han, Gorka Alkorta-Aranburu, William Wentworth-Sheilds, Mark Abney, Carole Ober, Dan L Nicolae
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-03-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC4348507?pdf=render
id doaj-36def3047b8d4177a975db5b3c22b4cc
record_format Article
spelling doaj-36def3047b8d4177a975db5b3c22b4cc2020-11-25T02:27:30ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582015-03-01113e100413910.1371/journal.pcbi.1004139PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.Oren E LivneLide HanGorka Alkorta-AranburuWilliam Wentworth-SheildsMark AbneyCarole OberDan L NicolaeFounder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.http://europepmc.org/articles/PMC4348507?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Oren E Livne
Lide Han
Gorka Alkorta-Aranburu
William Wentworth-Sheilds
Mark Abney
Carole Ober
Dan L Nicolae
spellingShingle Oren E Livne
Lide Han
Gorka Alkorta-Aranburu
William Wentworth-Sheilds
Mark Abney
Carole Ober
Dan L Nicolae
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
PLoS Computational Biology
author_facet Oren E Livne
Lide Han
Gorka Alkorta-Aranburu
William Wentworth-Sheilds
Mark Abney
Carole Ober
Dan L Nicolae
author_sort Oren E Livne
title PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
title_short PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
title_full PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
title_fullStr PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
title_full_unstemmed PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
title_sort primal: fast and accurate pedigree-based imputation from sequence data in a founder population.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2015-03-01
description Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.
url http://europepmc.org/articles/PMC4348507?pdf=render
work_keys_str_mv AT orenelivne primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
AT lidehan primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
AT gorkaalkortaaranburu primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
AT williamwentworthsheilds primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
AT markabney primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
AT caroleober primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
AT danlnicolae primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation
_version_ 1724842823341047808