A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

Abstract Background Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this m...

Full description

Bibliographic Details
Main Authors:	Daniel P. Wickland, Gopal Battu, Karen A. Hudson, Brian W. Diers, Matthew E. Hudson
Format:	Article
Language:	English
Published:	BMC 2017-12-01
Series:	BMC Bioinformatics
Subjects:	GBS WGS Bioinformatics pipelines Variant calling Soybean Crops
Online Access:	http://link.springer.com/article/10.1186/s12859-017-2000-6

id	doaj-46b3372ef31a43f9b5825dd63b520386
record_format	Article
spelling	doaj-46b3372ef31a43f9b5825dd63b5203862020-11-24T20:46:28ZengBMCBMC Bioinformatics1471-21052017-12-0118111210.1186/s12859-017-2000-6A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSyDaniel P. Wickland0Gopal Battu1Karen A. Hudson2Brian W. Diers3Matthew E. Hudson4Department of Crop Sciences, University of Illinois at Urbana-ChampaignDepartment of Crop Sciences, University of Illinois at Urbana-ChampaignUSDA-ARS Crop Production and Pest Control Research UnitDepartment of Crop Sciences, University of Illinois at Urbana-ChampaignDepartment of Crop Sciences, University of Illinois at Urbana-ChampaignAbstract Background Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. Results We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. Conclusions We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.http://link.springer.com/article/10.1186/s12859-017-2000-6GBSWGSBioinformatics pipelinesVariant callingSoybeanCrops
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Daniel P. Wickland Gopal Battu Karen A. Hudson Brian W. Diers Matthew E. Hudson
spellingShingle	Daniel P. Wickland Gopal Battu Karen A. Hudson Brian W. Diers Matthew E. Hudson A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy BMC Bioinformatics GBS WGS Bioinformatics pipelines Variant calling Soybean Crops
author_facet	Daniel P. Wickland Gopal Battu Karen A. Hudson Brian W. Diers Matthew E. Hudson
author_sort	Daniel P. Wickland
title	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_short	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_full	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_fullStr	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_full_unstemmed	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_sort	comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, gb-easy
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2017-12-01
description	Abstract Background Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. Results We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. Conclusions We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.
topic	GBS WGS Bioinformatics pipelines Variant calling Soybean Crops
url	http://link.springer.com/article/10.1186/s12859-017-2000-6
work_keys_str_mv	AT danielpwickland acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT gopalbattu acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT karenahudson acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT brianwdiers acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT matthewehudson acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT danielpwickland comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT gopalbattu comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT karenahudson comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT brianwdiers comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT matthewehudson comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
_version_	1716812483648290816

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

Similar Items