Protein molecular function prediction by Bayesian phylogenomics.

We present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a prot...

Full description

Bibliographic Details
Main Authors: Barbara E Engelhardt, Michael I Jordan, Kathryn E Muratore, Steven E Brenner
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2005-10-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC1246806?pdf=render
id doaj-bcec9c085d7c46e5b7bd50186b444dc5
record_format Article
spelling doaj-bcec9c085d7c46e5b7bd50186b444dc52020-11-24T21:58:58ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582005-10-0115e4510.1371/journal.pcbi.0010045Protein molecular function prediction by Bayesian phylogenomics.Barbara E EngelhardtMichael I JordanKathryn E MuratoreSteven E BrennerWe present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Our method produced specific and consistent molecular function predictions across 100 Pfam families in comparison to the Gene Ontology annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more detailed exploration of functional predictions on the adenosine-5'-monophosphate/adenosine deaminase family and the lactate/malate dehydrogenase family, in the former case comparing the predictions against a gold standard set of published functional characterizations. Given function annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96% accuracy in predicting molecular function for experimentally characterized proteins as reported in the literature. The accuracy of SIFTER on this dataset is a significant improvement over other currently available methods such as BLAST (75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also experimentally characterized the adenosine deaminase from Plasmodium falciparum, confirming SIFTER's prediction. The results illustrate the predictive power of exploiting a statistical model of function evolution in phylogenomic problems. A software implementation of SIFTER is available from the authors.http://europepmc.org/articles/PMC1246806?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Barbara E Engelhardt
Michael I Jordan
Kathryn E Muratore
Steven E Brenner
spellingShingle Barbara E Engelhardt
Michael I Jordan
Kathryn E Muratore
Steven E Brenner
Protein molecular function prediction by Bayesian phylogenomics.
PLoS Computational Biology
author_facet Barbara E Engelhardt
Michael I Jordan
Kathryn E Muratore
Steven E Brenner
author_sort Barbara E Engelhardt
title Protein molecular function prediction by Bayesian phylogenomics.
title_short Protein molecular function prediction by Bayesian phylogenomics.
title_full Protein molecular function prediction by Bayesian phylogenomics.
title_fullStr Protein molecular function prediction by Bayesian phylogenomics.
title_full_unstemmed Protein molecular function prediction by Bayesian phylogenomics.
title_sort protein molecular function prediction by bayesian phylogenomics.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2005-10-01
description We present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Our method produced specific and consistent molecular function predictions across 100 Pfam families in comparison to the Gene Ontology annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more detailed exploration of functional predictions on the adenosine-5'-monophosphate/adenosine deaminase family and the lactate/malate dehydrogenase family, in the former case comparing the predictions against a gold standard set of published functional characterizations. Given function annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96% accuracy in predicting molecular function for experimentally characterized proteins as reported in the literature. The accuracy of SIFTER on this dataset is a significant improvement over other currently available methods such as BLAST (75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also experimentally characterized the adenosine deaminase from Plasmodium falciparum, confirming SIFTER's prediction. The results illustrate the predictive power of exploiting a statistical model of function evolution in phylogenomic problems. A software implementation of SIFTER is available from the authors.
url http://europepmc.org/articles/PMC1246806?pdf=render
work_keys_str_mv AT barbaraeengelhardt proteinmolecularfunctionpredictionbybayesianphylogenomics
AT michaelijordan proteinmolecularfunctionpredictionbybayesianphylogenomics
AT kathrynemuratore proteinmolecularfunctionpredictionbybayesianphylogenomics
AT stevenebrenner proteinmolecularfunctionpredictionbybayesianphylogenomics
_version_ 1725849955515498496