Predicting protein linkages in bacteria: Which method is best depends on task

Abstract Background Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods a...

Full description

Bibliographic Details
Main Authors:	Leach Sonia M, Karimpour-Fard Anis, Gill Ryan T, Hunter Lawrence E
Format:	Article
Language:	English
Published:	BMC 2008-09-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/9/397

id	doaj-36c18239e43f41afa91020929eebb982
record_format	Article
spelling	doaj-36c18239e43f41afa91020929eebb9822020-11-25T00:20:27ZengBMCBMC Bioinformatics1471-21052008-09-019139710.1186/1471-2105-9-397Predicting protein linkages in bacteria: Which method is best depends on taskLeach Sonia MKarimpour-Fard AnisGill Ryan THunter Lawrence E<p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p> http://www.biomedcentral.com/1471-2105/9/397
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Leach Sonia M Karimpour-Fard Anis Gill Ryan T Hunter Lawrence E
spellingShingle	Leach Sonia M Karimpour-Fard Anis Gill Ryan T Hunter Lawrence E Predicting protein linkages in bacteria: Which method is best depends on task BMC Bioinformatics
author_facet	Leach Sonia M Karimpour-Fard Anis Gill Ryan T Hunter Lawrence E
author_sort	Leach Sonia M
title	Predicting protein linkages in bacteria: Which method is best depends on task
title_short	Predicting protein linkages in bacteria: Which method is best depends on task
title_full	Predicting protein linkages in bacteria: Which method is best depends on task
title_fullStr	Predicting protein linkages in bacteria: Which method is best depends on task
title_full_unstemmed	Predicting protein linkages in bacteria: Which method is best depends on task
title_sort	predicting protein linkages in bacteria: which method is best depends on task
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2008-09-01
description	<p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p>
url	http://www.biomedcentral.com/1471-2105/9/397
work_keys_str_mv	AT leachsoniam predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask AT karimpourfardanis predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask AT gillryant predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask AT hunterlawrencee predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
_version_	1725367524688658432

Predicting protein linkages in bacteria: Which method is best depends on task

Similar Items