A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.

A common biological pathway reconstruction approach -- as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences -- starts with the identification of protein functions or families (e.g., KO families for the...

Full description

Bibliographic Details
Main Authors: Yuzhen Ye, Thomas G Doak
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-08-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2714467?pdf=render
id doaj-b7384ee297e24e6ebb534c625c3238d0
record_format Article
spelling doaj-b7384ee297e24e6ebb534c625c3238d02020-11-25T01:53:27ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582009-08-0158e100046510.1371/journal.pcbi.1000465A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.Yuzhen YeThomas G DoakA common biological pathway reconstruction approach -- as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences -- starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database -- as compared to the naïve mapping approach -- eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.http://europepmc.org/articles/PMC2714467?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Yuzhen Ye
Thomas G Doak
spellingShingle Yuzhen Ye
Thomas G Doak
A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
PLoS Computational Biology
author_facet Yuzhen Ye
Thomas G Doak
author_sort Yuzhen Ye
title A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
title_short A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
title_full A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
title_fullStr A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
title_full_unstemmed A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
title_sort parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2009-08-01
description A common biological pathway reconstruction approach -- as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences -- starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database -- as compared to the naïve mapping approach -- eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.
url http://europepmc.org/articles/PMC2714467?pdf=render
work_keys_str_mv AT yuzhenye aparsimonyapproachtobiologicalpathwayreconstructioninferenceforgenomesandmetagenomes
AT thomasgdoak aparsimonyapproachtobiologicalpathwayreconstructioninferenceforgenomesandmetagenomes
AT yuzhenye parsimonyapproachtobiologicalpathwayreconstructioninferenceforgenomesandmetagenomes
AT thomasgdoak parsimonyapproachtobiologicalpathwayreconstructioninferenceforgenomesandmetagenomes
_version_ 1724990994951176192