Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics
Abstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-01-01
|
Series: | Plant Methods |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13007-019-0534-5 |
id |
doaj-9e54c563c4114518a002a4bdb84c6d71 |
---|---|
record_format |
Article |
spelling |
doaj-9e54c563c4114518a002a4bdb84c6d712021-01-03T12:05:53ZengBMCPlant Methods1746-48112020-01-011611810.1186/s13007-019-0534-5Large scale genome skimming from herbarium material for accurate plant identification and phylogenomicsPaul G. Nevill0Xiao Zhong1Julian Tonti-Filippini2Margaret Byrne3Michael Hislop4Kevin Thiele5Stephen van Leeuwen6Laura M. Boykin7Ian Small8Australian Research Council Centre for Mine Site Restoration, School of Molecular and Life Sciences, Curtin UniversityAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaSchool of Biological Sciences, The University of Western AustraliaBiodiversity and Conservation Science, Department of Biodiversity, Conservation and AttractionsSchool of Biological Sciences, The University of Western AustraliaBiodiversity and Conservation Science, Department of Biodiversity, Conservation and AttractionsAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaAustralian Research Council Centre of Excellence in Plant Energy Biology, The University of Western AustraliaAbstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara. Results Genome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL. Conclusions We showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics.https://doi.org/10.1186/s13007-019-0534-5ChloroplastGenome skimmingHerbarium specimensNext-generation sequencingPilbaraPlant DNA barcoding |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Paul G. Nevill Xiao Zhong Julian Tonti-Filippini Margaret Byrne Michael Hislop Kevin Thiele Stephen van Leeuwen Laura M. Boykin Ian Small |
spellingShingle |
Paul G. Nevill Xiao Zhong Julian Tonti-Filippini Margaret Byrne Michael Hislop Kevin Thiele Stephen van Leeuwen Laura M. Boykin Ian Small Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics Plant Methods Chloroplast Genome skimming Herbarium specimens Next-generation sequencing Pilbara Plant DNA barcoding |
author_facet |
Paul G. Nevill Xiao Zhong Julian Tonti-Filippini Margaret Byrne Michael Hislop Kevin Thiele Stephen van Leeuwen Laura M. Boykin Ian Small |
author_sort |
Paul G. Nevill |
title |
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics |
title_short |
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics |
title_full |
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics |
title_fullStr |
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics |
title_full_unstemmed |
Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics |
title_sort |
large scale genome skimming from herbarium material for accurate plant identification and phylogenomics |
publisher |
BMC |
series |
Plant Methods |
issn |
1746-4811 |
publishDate |
2020-01-01 |
description |
Abstract Background Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara. Results Genome skimming was effective at producing genomic information at large scale. Substantial sequence information on the chloroplast genome was obtained from 96.1% of samples, and complete or near-complete sequences of the nuclear ribosomal RNA gene repeat were obtained from 93.3% of samples. We were able to extract sequences for the core DNA barcode regions rbcL and matK from 96 to 93.3% of samples, respectively. Read quality and DNA fragment length had significant effects on sequencing outcomes and error correction of reads proved essential. Assembly problems were specific to certain taxa with low GC and high repeat content (Goodenia, Scaevola, Cyperus, Bulbostylis, Fimbristylis) suggesting biological rather than technical explanations. The structure of related genomes was needed to guide the assembly of repeats that exceeded the read length. DNA-based matching proved highly effective and showed that the efficacy for species identification declined in the order cpDNA >> rDNA > matK >> rbcL. Conclusions We showed that a large-scale approach to genome sequencing using herbarium specimens produces high-quality complete cpDNA and rDNA sequences as a source of data for DNA barcoding and phylogenomics. |
topic |
Chloroplast Genome skimming Herbarium specimens Next-generation sequencing Pilbara Plant DNA barcoding |
url |
https://doi.org/10.1186/s13007-019-0534-5 |
work_keys_str_mv |
AT paulgnevill largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT xiaozhong largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT juliantontifilippini largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT margaretbyrne largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT michaelhislop largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT kevinthiele largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT stephenvanleeuwen largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT lauramboykin largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics AT iansmall largescalegenomeskimmingfromherbariummaterialforaccurateplantidentificationandphylogenomics |
_version_ |
1724350872145625088 |