Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice

We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6...

Full description

Bibliographic Details
Main Authors: Bink, M.C.A.M (Author), Calus, M.P.L (Author), Churchill, G.A (Author), Perez, B.C (Author), Svenson, K.L (Author)
Format: Article
Language:English
Published: NLM (Medline) 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03013nam a2200397Ia 4500
001 10-1093-g3journal-jkac039
008 220425s2022 CNT 000 0 und d
020 |a 21601836 (ISSN) 
245 1 0 |a Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice 
260 0 |b NLM (Medline)  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1093/g3journal/jkac039 
520 3 |a We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects. © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. 
650 0 4 |a animal 
650 0 4 |a Animals 
650 0 4 |a Genomic Prediction 
650 0 4 |a genomics 
650 0 4 |a Genomics 
650 0 4 |a genotype 
650 0 4 |a Genotype 
650 0 4 |a GenPred 
650 0 4 |a Linear Models 
650 0 4 |a Mice 
650 0 4 |a mouse 
650 0 4 |a multifactorial inheritance 
650 0 4 |a Multifactorial Inheritance 
650 0 4 |a phenotype 
650 0 4 |a Phenotype 
650 0 4 |a procedures 
650 0 4 |a Shared Data Resources 
650 0 4 |a statistical model 
700 1 |a Bink, M.C.A.M.  |e author 
700 1 |a Calus, M.P.L.  |e author 
700 1 |a Churchill, G.A.  |e author 
700 1 |a Perez, B.C.  |e author 
700 1 |a Svenson, K.L.  |e author 
773 |t G3 (Bethesda, Md.)