Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics

Abstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from...

Full description

Bibliographic Details
Main Authors: Paul G. Nevill, Xiao Zhong, Julian Tonti-Filippini, Margaret Byrne, Michael Hislop, Kevin Thiele, Stephen van Leeuwen, Laura M. Boykin, Ian Small
Format: Article
Language:English
Published: BMC 2020-01-01
Series:Plant Methods
Subjects:
Online Access:https://doi.org/10.1186/s13007-019-0534-5
id doaj-9e54c563c4114518a002a4bdb84c6d71
record_format Article
spelling doaj-9e54c563c4114518a002a4bdb84c6d712021-01-03T12:05:53ZengBMCPlant Methods1746-48112020-01-011611810.1186/s13007-019-0534-5Large scale genome skimming from herbarium material for accurate plant identification and phylogenomicsPaul G. Nevill0Xiao Zhong1Julian Tonti-Filippini2Margaret Byrne3Michael Hislop4Kevin Thiele5Stephen van Leeuwen6Laura M. Boykin7Ian Small8Australian Research Council Centre for Mine Site Restoration, School of Molecular and Life Sciences, Curtin UniversityAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaSchool of Biological Sciences, The University of Western AustraliaBiodiversity and Conservation Science, Department of Biodiversity, Conservation and AttractionsSchool of Biological Sciences, The University of Western AustraliaBiodiversity and Conservation Science, Department of Biodiversity, Conservation and AttractionsAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaAbstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara. Results Genome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL. Conclusions We showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics.https://doi.org/10.1186/s13007-019-0534-5ChloroplastGenome skimmingHerbarium specimensNext-generation sequencingPilbaraPlant DNA barcoding
collection DOAJ
language English
format Article
sources DOAJ
author Paul G. Nevill
Xiao Zhong
Julian Tonti-Filippini
Margaret Byrne
Michael Hislop
Kevin Thiele
Stephen van Leeuwen
Laura M. Boykin
Ian Small
spellingShingle Paul G. Nevill
Xiao Zhong
Julian Tonti-Filippini
Margaret Byrne
Michael Hislop
Kevin Thiele
Stephen van Leeuwen
Laura M. Boykin
Ian Small
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
Plant Methods
Chloroplast
Genome skimming
Herbarium specimens
Next-generation sequencing
Pilbara
Plant DNA barcoding
author_facet Paul G. Nevill
Xiao Zhong
Julian Tonti-Filippini
Margaret Byrne
Michael Hislop
Kevin Thiele
Stephen van Leeuwen
Laura M. Boykin
Ian Small
author_sort Paul G. Nevill
title Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
title_short Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
title_full Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
title_fullStr Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
title_full_unstemmed Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
title_sort large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
publisher BMC
series Plant Methods
issn 1746-4811
publishDate 2020-01-01
description Abstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara. Results Genome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL. Conclusions We showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics.
topic Chloroplast
Genome skimming
Herbarium specimens
Next-generation sequencing
Pilbara
Plant DNA barcoding
url https://doi.org/10.1186/s13007-019-0534-5
work_keys_str_mv AT paulgnevill largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT xiaozhong largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT juliantontifilippini largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT margaretbyrne largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT michaelhislop largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT kevinthiele largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT stephenvanleeuwen largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT lauramboykin largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
AT iansmall largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics
_version_ 1724350872145625088