Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding

DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this me...

Full description

Bibliographic Details
Main Authors: Matthieu Leray, Nancy Knowlton
Format: Article
Language:English
Published: PeerJ Inc. 2017-03-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/3006.pdf
id doaj-f6eac99a35344229955efdfd30158395
record_format Article
spelling doaj-f6eac99a35344229955efdfd301583952020-11-24T22:38:53ZengPeerJ Inc.PeerJ2167-83592017-03-015e300610.7717/peerj.3006Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcodingMatthieu Leray0Nancy Knowlton1National Museum of Natural History, Smithsonian Institution, Washington, D.C., USANational Museum of Natural History, Smithsonian Institution, Washington, D.C., USADNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence–absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.https://peerj.com/articles/3006.pdfIndexed PCR primersMultiplexingReproducibility
collection DOAJ
language English
format Article
sources DOAJ
author Matthieu Leray
Nancy Knowlton
spellingShingle Matthieu Leray
Nancy Knowlton
Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding
PeerJ
Indexed PCR primers
Multiplexing
Reproducibility
author_facet Matthieu Leray
Nancy Knowlton
author_sort Matthieu Leray
title Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding
title_short Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding
title_full Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding
title_fullStr Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding
title_full_unstemmed Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding
title_sort random sampling causes the low reproducibility of rare eukaryotic otus in illumina coi metabarcoding
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2017-03-01
description DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence–absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
topic Indexed PCR primers
Multiplexing
Reproducibility
url https://peerj.com/articles/3006.pdf
work_keys_str_mv AT matthieuleray randomsamplingcausesthelowreproducibilityofrareeukaryoticotusinilluminacoimetabarcoding
AT nancyknowlton randomsamplingcausesthelowreproducibilityofrareeukaryoticotusinilluminacoimetabarcoding
_version_ 1725711294874517504