Predicting protein linkages in bacteria: Which method is best depends on task

<p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods a...

Full description

Bibliographic Details
Main Authors: Leach Sonia M, Karimpour-Fard Anis, Gill Ryan T, Hunter Lawrence E
Format: Article
Language:English
Published: BMC 2008-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/397
id doaj-36c18239e43f41afa91020929eebb982
record_format Article
spelling doaj-36c18239e43f41afa91020929eebb9822020-11-25T00:20:27ZengBMCBMC Bioinformatics1471-21052008-09-019139710.1186/1471-2105-9-397Predicting protein linkages in bacteria: Which method is best depends on taskLeach Sonia MKarimpour-Fard AnisGill Ryan THunter Lawrence E<p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p> http://www.biomedcentral.com/1471-2105/9/397
collection DOAJ
language English
format Article
sources DOAJ
author Leach Sonia M
Karimpour-Fard Anis
Gill Ryan T
Hunter Lawrence E
spellingShingle Leach Sonia M
Karimpour-Fard Anis
Gill Ryan T
Hunter Lawrence E
Predicting protein linkages in bacteria: Which method is best depends on task
BMC Bioinformatics
author_facet Leach Sonia M
Karimpour-Fard Anis
Gill Ryan T
Hunter Lawrence E
author_sort Leach Sonia M
title Predicting protein linkages in bacteria: Which method is best depends on task
title_short Predicting protein linkages in bacteria: Which method is best depends on task
title_full Predicting protein linkages in bacteria: Which method is best depends on task
title_fullStr Predicting protein linkages in bacteria: Which method is best depends on task
title_full_unstemmed Predicting protein linkages in bacteria: Which method is best depends on task
title_sort predicting protein linkages in bacteria: which method is best depends on task
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-09-01
description <p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p>
url http://www.biomedcentral.com/1471-2105/9/397
work_keys_str_mv AT leachsoniam predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
AT karimpourfardanis predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
AT gillryant predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
AT hunterlawrencee predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
_version_ 1725367524688658432