BWGS: A R package for genomic selection and its application to a wheat breeding programme.

We developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS, for BreedWheat Genomic selection, was developed in the framework of a cooperative private-public partnership project called Breedwheat (https://bree...

Full description

Bibliographic Details
Main Authors: Gilles Charmet, Louis-Gautier Tran, Jérôme Auzanneau, Renaud Rincent, Sophie Bouchet
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0222733
id doaj-11021e6fd7d44f28ae05f602b7037e3f
record_format Article
spelling doaj-11021e6fd7d44f28ae05f602b7037e3f2021-03-03T21:39:34ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01154e022273310.1371/journal.pone.0222733BWGS: A R package for genomic selection and its application to a wheat breeding programme.Gilles CharmetLouis-Gautier TranJérôme AuzanneauRenaud RincentSophie BouchetWe developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS, for BreedWheat Genomic selection, was developed in the framework of a cooperative private-public partnership project called Breedwheat (https://breedwheat.fr) and relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric. The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.https://doi.org/10.1371/journal.pone.0222733
collection DOAJ
language English
format Article
sources DOAJ
author Gilles Charmet
Louis-Gautier Tran
Jérôme Auzanneau
Renaud Rincent
Sophie Bouchet
spellingShingle Gilles Charmet
Louis-Gautier Tran
Jérôme Auzanneau
Renaud Rincent
Sophie Bouchet
BWGS: A R package for genomic selection and its application to a wheat breeding programme.
PLoS ONE
author_facet Gilles Charmet
Louis-Gautier Tran
Jérôme Auzanneau
Renaud Rincent
Sophie Bouchet
author_sort Gilles Charmet
title BWGS: A R package for genomic selection and its application to a wheat breeding programme.
title_short BWGS: A R package for genomic selection and its application to a wheat breeding programme.
title_full BWGS: A R package for genomic selection and its application to a wheat breeding programme.
title_fullStr BWGS: A R package for genomic selection and its application to a wheat breeding programme.
title_full_unstemmed BWGS: A R package for genomic selection and its application to a wheat breeding programme.
title_sort bwgs: a r package for genomic selection and its application to a wheat breeding programme.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description We developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS, for BreedWheat Genomic selection, was developed in the framework of a cooperative private-public partnership project called Breedwheat (https://breedwheat.fr) and relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric. The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.
url https://doi.org/10.1371/journal.pone.0222733
work_keys_str_mv AT gillescharmet bwgsarpackageforgenomicselectionanditsapplicationtoawheatbreedingprogramme
AT louisgautiertran bwgsarpackageforgenomicselectionanditsapplicationtoawheatbreedingprogramme
AT jeromeauzanneau bwgsarpackageforgenomicselectionanditsapplicationtoawheatbreedingprogramme
AT renaudrincent bwgsarpackageforgenomicselectionanditsapplicationtoawheatbreedingprogramme
AT sophiebouchet bwgsarpackageforgenomicselectionanditsapplicationtoawheatbreedingprogramme
_version_ 1714815723074748416