PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.
Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2015-03-01
|
Series: | PLoS Computational Biology |
Online Access: | http://europepmc.org/articles/PMC4348507?pdf=render |
id |
doaj-36def3047b8d4177a975db5b3c22b4cc |
---|---|
record_format |
Article |
spelling |
doaj-36def3047b8d4177a975db5b3c22b4cc2020-11-25T02:27:30ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582015-03-01113e100413910.1371/journal.pcbi.1004139PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population.Oren E LivneLide HanGorka Alkorta-AranburuWilliam Wentworth-SheildsMark AbneyCarole OberDan L NicolaeFounder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.http://europepmc.org/articles/PMC4348507?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Oren E Livne Lide Han Gorka Alkorta-Aranburu William Wentworth-Sheilds Mark Abney Carole Ober Dan L Nicolae |
spellingShingle |
Oren E Livne Lide Han Gorka Alkorta-Aranburu William Wentworth-Sheilds Mark Abney Carole Ober Dan L Nicolae PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. PLoS Computational Biology |
author_facet |
Oren E Livne Lide Han Gorka Alkorta-Aranburu William Wentworth-Sheilds Mark Abney Carole Ober Dan L Nicolae |
author_sort |
Oren E Livne |
title |
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. |
title_short |
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. |
title_full |
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. |
title_fullStr |
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. |
title_full_unstemmed |
PRIMAL: Fast and accurate pedigree-based imputation from sequence data in a founder population. |
title_sort |
primal: fast and accurate pedigree-based imputation from sequence data in a founder population. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2015-03-01 |
description |
Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost. |
url |
http://europepmc.org/articles/PMC4348507?pdf=render |
work_keys_str_mv |
AT orenelivne primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation AT lidehan primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation AT gorkaalkortaaranburu primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation AT williamwentworthsheilds primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation AT markabney primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation AT caroleober primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation AT danlnicolae primalfastandaccuratepedigreebasedimputationfromsequencedatainafounderpopulation |
_version_ |
1724842823341047808 |