Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect

Abstract Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) sin...

Full description

Bibliographic Details
Main Authors: Irene van den Berg, Phil J. Bowman, Iona M. MacLeod, Ben J. Hayes, Tingting Wang, Sunduimijid Bolormaa, Mike E. Goddard
Format: Article
Language:deu
Published: BMC 2017-09-01
Series:Genetics Selection Evolution
Online Access:http://link.springer.com/article/10.1186/s12711-017-0347-9
id doaj-6cbfa0f482f04447945300d8dc529f52
record_format Article
spelling doaj-6cbfa0f482f04447945300d8dc529f522020-11-24T21:11:26ZdeuBMCGenetics Selection Evolution1297-96862017-09-0149111510.1186/s12711-017-0347-9Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effectIrene van den Berg0Phil J. Bowman1Iona M. MacLeod2Ben J. Hayes3Tingting Wang4Sunduimijid Bolormaa5Mike E. Goddard6Faculty of Veterinary and Agricultural Science, University of MelbourneAgriculture Victoria, AgriBio, Centre for AgriBioscienceAgriculture Victoria, AgriBio, Centre for AgriBioscienceAgriculture Victoria, AgriBio, Centre for AgriBioscienceAgriculture Victoria, AgriBio, Centre for AgriBioscienceAgriculture Victoria, AgriBio, Centre for AgriBioscienceFaculty of Veterinary and Agricultural Science, University of MelbourneAbstract Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.http://link.springer.com/article/10.1186/s12711-017-0347-9
collection DOAJ
language deu
format Article
sources DOAJ
author Irene van den Berg
Phil J. Bowman
Iona M. MacLeod
Ben J. Hayes
Tingting Wang
Sunduimijid Bolormaa
Mike E. Goddard
spellingShingle Irene van den Berg
Phil J. Bowman
Iona M. MacLeod
Ben J. Hayes
Tingting Wang
Sunduimijid Bolormaa
Mike E. Goddard
Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
Genetics Selection Evolution
author_facet Irene van den Berg
Phil J. Bowman
Iona M. MacLeod
Ben J. Hayes
Tingting Wang
Sunduimijid Bolormaa
Mike E. Goddard
author_sort Irene van den Berg
title Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_short Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_full Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_fullStr Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_full_unstemmed Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
title_sort multi-breed genomic prediction using bayes r with sequence data and dropping variants with a small effect
publisher BMC
series Genetics Selection Evolution
issn 1297-9686
publishDate 2017-09-01
description Abstract Background The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.
url http://link.springer.com/article/10.1186/s12711-017-0347-9
work_keys_str_mv AT irenevandenberg multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
AT philjbowman multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
AT ionammacleod multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
AT benjhayes multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
AT tingtingwang multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
AT sunduimijidbolormaa multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
AT mikeegoddard multibreedgenomicpredictionusingbayesrwithsequencedataanddroppingvariantswithasmalleffect
_version_ 1716753369428656128