Fast accurate missing SNP genotype local imputation

<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphism (SNP) genotyping assays normally give rise to certain percents of no-calls; the problem becomes severe when the target organisms, such as cattle, do not have a high resolution genomic sequence. Missing...

Full description

Bibliographic Details
Main Authors: Wang Yining, Cai Zhipeng, Stothard Paul, Moore Steve, Goebel Randy, Wang Lusheng, Lin Guohui
Format: Article
Language:English
Published: BMC 2012-08-01
Series:BMC Research Notes
Online Access:http://www.biomedcentral.com/1756-0500/5/404
id doaj-97f3093660504a7296d57d765468d56b
record_format Article
spelling doaj-97f3093660504a7296d57d765468d56b2020-11-25T01:31:37ZengBMCBMC Research Notes1756-05002012-08-015140410.1186/1756-0500-5-404Fast accurate missing SNP genotype local imputationWang YiningCai ZhipengStothard PaulMoore SteveGoebel RandyWang LushengLin Guohui<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphism (SNP) genotyping assays normally give rise to certain percents of no-calls; the problem becomes severe when the target organisms, such as cattle, do not have a high resolution genomic sequence. Missing SNP genotypes, when related to target traits, would confound downstream data analyses such as genome-wide association studies (GWAS). Existing methods for recovering the missing values are successful to some extent – either accurate but not fast enough or fast but not accurate enough.</p> <p>Results</p> <p>To a target missing genotype, we take only the SNP loci within a genetic distance vicinity and only the samples within a similarity vicinity into our local imputation process. For missing genotype imputation, the comparative performance evaluations through extensive simulation studies using real human and cattle genotype datasets demonstrated that our nearest neighbor based local imputation method was one of the most efficient methods, and outperformed existing methods except the time-consuming fastPHASE; for missing haplotype allele imputation, the comparative performance evaluations using real mouse haplotype datasets demonstrated that our method was not only one of the most efficient methods, but also one of the most accurate methods.</p> <p>Conclusions</p> <p>Given that fastPHASE requires a long imputation time on medium to high density datasets, and that our nearest neighbor based local imputation method only performed slightly worse, yet better than all other methods, one might want to adopt our method as an alternative missing SNP genotype or missing haplotype allele imputation method.</p> http://www.biomedcentral.com/1756-0500/5/404
collection DOAJ
language English
format Article
sources DOAJ
author Wang Yining
Cai Zhipeng
Stothard Paul
Moore Steve
Goebel Randy
Wang Lusheng
Lin Guohui
spellingShingle Wang Yining
Cai Zhipeng
Stothard Paul
Moore Steve
Goebel Randy
Wang Lusheng
Lin Guohui
Fast accurate missing SNP genotype local imputation
BMC Research Notes
author_facet Wang Yining
Cai Zhipeng
Stothard Paul
Moore Steve
Goebel Randy
Wang Lusheng
Lin Guohui
author_sort Wang Yining
title Fast accurate missing SNP genotype local imputation
title_short Fast accurate missing SNP genotype local imputation
title_full Fast accurate missing SNP genotype local imputation
title_fullStr Fast accurate missing SNP genotype local imputation
title_full_unstemmed Fast accurate missing SNP genotype local imputation
title_sort fast accurate missing snp genotype local imputation
publisher BMC
series BMC Research Notes
issn 1756-0500
publishDate 2012-08-01
description <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphism (SNP) genotyping assays normally give rise to certain percents of no-calls; the problem becomes severe when the target organisms, such as cattle, do not have a high resolution genomic sequence. Missing SNP genotypes, when related to target traits, would confound downstream data analyses such as genome-wide association studies (GWAS). Existing methods for recovering the missing values are successful to some extent – either accurate but not fast enough or fast but not accurate enough.</p> <p>Results</p> <p>To a target missing genotype, we take only the SNP loci within a genetic distance vicinity and only the samples within a similarity vicinity into our local imputation process. For missing genotype imputation, the comparative performance evaluations through extensive simulation studies using real human and cattle genotype datasets demonstrated that our nearest neighbor based local imputation method was one of the most efficient methods, and outperformed existing methods except the time-consuming fastPHASE; for missing haplotype allele imputation, the comparative performance evaluations using real mouse haplotype datasets demonstrated that our method was not only one of the most efficient methods, but also one of the most accurate methods.</p> <p>Conclusions</p> <p>Given that fastPHASE requires a long imputation time on medium to high density datasets, and that our nearest neighbor based local imputation method only performed slightly worse, yet better than all other methods, one might want to adopt our method as an alternative missing SNP genotype or missing haplotype allele imputation method.</p>
url http://www.biomedcentral.com/1756-0500/5/404
work_keys_str_mv AT wangyining fastaccuratemissingsnpgenotypelocalimputation
AT caizhipeng fastaccuratemissingsnpgenotypelocalimputation
AT stothardpaul fastaccuratemissingsnpgenotypelocalimputation
AT mooresteve fastaccuratemissingsnpgenotypelocalimputation
AT goebelrandy fastaccuratemissingsnpgenotypelocalimputation
AT wanglusheng fastaccuratemissingsnpgenotypelocalimputation
AT linguohui fastaccuratemissingsnpgenotypelocalimputation
_version_ 1725085640632041472