Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes

Abstract Background Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. The...

Full description

Bibliographic Details
Main Authors: Michelle L. Treiber, Diana H. Taft, Ian Korf, David A. Mills, Danielle G. Lemay
Format: Article
Language:English
Published: BMC 2020-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-3416-y
id doaj-e06905bab35f40d1a527294dd471856a
record_format Article
spelling doaj-e06905bab35f40d1a527294dd471856a2020-11-25T00:28:08ZengBMCBMC Bioinformatics1471-21052020-02-0121111510.1186/s12859-020-3416-yPre- and post-sequencing recommendations for functional annotation of human fecal metagenomesMichelle L. Treiber0Diana H. Taft1Ian Korf2David A. Mills3Danielle G. Lemay4USDA ARS Western Human Nutrition Research CenterDepartment of Food Science and Technology, Robert Mondavi Institute for Wine and Food Science, University of California, DavisGenome Center, University of CaliforniaDepartment of Food Science and Technology, Robert Mondavi Institute for Wine and Food Science, University of California, DavisUSDA ARS Western Human Nutrition Research CenterAbstract Background Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. The ability to detect genes in short read sequences is dependent on pre- and post-sequencing decisions. The objective of the current study was to determine how library size selection, read length and format, protein database, e-value threshold, and sequencing depth impact gene-centric analysis of human fecal microbiomes when using DIAMOND, an alignment tool that is up to 20,000 times faster than BLASTX. Results Using metagenomes simulated from a database of experimentally verified protein sequences, we find that read length, e-value threshold, and the choice of protein database dramatically impact detection of a known target, with best performance achieved with longer reads, stricter e-value thresholds, and a custom database. Using publicly available metagenomes, we evaluated library size selection, paired end read strategy, and sequencing depth. Longer read lengths were acheivable by merging paired ends when the sequencing library was size-selected to enable overlaps. When paired ends could not be merged, a congruent strategy in which both ends are independently mapped was acceptable. Sequencing depths of 5 million merged reads minimized the error of abundance estimates of specific target genes, including an antimicrobial resistance gene. Conclusions Shotgun metagenomes of DNA extracted from human fecal samples sequenced using the Illumina platform should be size-selected to enable merging of paired end reads and should be sequenced in the PE150 format with a minimum sequencing depth of 5 million merge-able reads to enable detection of specific target genes. Expecting the merged reads to be 180-250 bp in length, the appropriate e-value threshold for DIAMOND would then need to be more strict than the default. Accurate and interpretable results for specific hypotheses will be best obtained using small databases customized for the research question.http://link.springer.com/article/10.1186/s12859-020-3416-yMetagenomesMetagenomicsFecalStoolHumanFunctional annotation
collection DOAJ
language English
format Article
sources DOAJ
author Michelle L. Treiber
Diana H. Taft
Ian Korf
David A. Mills
Danielle G. Lemay
spellingShingle Michelle L. Treiber
Diana H. Taft
Ian Korf
David A. Mills
Danielle G. Lemay
Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
BMC Bioinformatics
Metagenomes
Metagenomics
Fecal
Stool
Human
Functional annotation
author_facet Michelle L. Treiber
Diana H. Taft
Ian Korf
David A. Mills
Danielle G. Lemay
author_sort Michelle L. Treiber
title Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
title_short Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
title_full Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
title_fullStr Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
title_full_unstemmed Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
title_sort pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-02-01
description Abstract Background Shotgun metagenomes are often assembled prior to annotation of genes which biases the functional capacity of a community towards its most abundant members. For an unbiased assessment of community function, short reads need to be mapped directly to a gene or protein database. The ability to detect genes in short read sequences is dependent on pre- and post-sequencing decisions. The objective of the current study was to determine how library size selection, read length and format, protein database, e-value threshold, and sequencing depth impact gene-centric analysis of human fecal microbiomes when using DIAMOND, an alignment tool that is up to 20,000 times faster than BLASTX. Results Using metagenomes simulated from a database of experimentally verified protein sequences, we find that read length, e-value threshold, and the choice of protein database dramatically impact detection of a known target, with best performance achieved with longer reads, stricter e-value thresholds, and a custom database. Using publicly available metagenomes, we evaluated library size selection, paired end read strategy, and sequencing depth. Longer read lengths were acheivable by merging paired ends when the sequencing library was size-selected to enable overlaps. When paired ends could not be merged, a congruent strategy in which both ends are independently mapped was acceptable. Sequencing depths of 5 million merged reads minimized the error of abundance estimates of specific target genes, including an antimicrobial resistance gene. Conclusions Shotgun metagenomes of DNA extracted from human fecal samples sequenced using the Illumina platform should be size-selected to enable merging of paired end reads and should be sequenced in the PE150 format with a minimum sequencing depth of 5 million merge-able reads to enable detection of specific target genes. Expecting the merged reads to be 180-250 bp in length, the appropriate e-value threshold for DIAMOND would then need to be more strict than the default. Accurate and interpretable results for specific hypotheses will be best obtained using small databases customized for the research question.
topic Metagenomes
Metagenomics
Fecal
Stool
Human
Functional annotation
url http://link.springer.com/article/10.1186/s12859-020-3416-y
work_keys_str_mv AT michelleltreiber preandpostsequencingrecommendationsforfunctionalannotationofhumanfecalmetagenomes
AT dianahtaft preandpostsequencingrecommendationsforfunctionalannotationofhumanfecalmetagenomes
AT iankorf preandpostsequencingrecommendationsforfunctionalannotationofhumanfecalmetagenomes
AT davidamills preandpostsequencingrecommendationsforfunctionalannotationofhumanfecalmetagenomes
AT danielleglemay preandpostsequencingrecommendationsforfunctionalannotationofhumanfecalmetagenomes
_version_ 1725336772326457344