A functional hierarchical organization of the protein sequence space

<p>Abstract</p> <p>Background</p> <p>It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of kn...

Full description

Bibliographic Details
Main Authors: Fromer Menachem, Friedlich Moriah, Kaplan Noam, Linial Michal
Format: Article
Language:English
Published: BMC 2004-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/196
id doaj-ad0331e9418c4b19a6d98ae810d9225f
record_format Article
spelling doaj-ad0331e9418c4b19a6d98ae810d9225f2020-11-25T00:23:23ZengBMCBMC Bioinformatics1471-21052004-12-015119610.1186/1471-2105-5-196A functional hierarchical organization of the protein sequence spaceFromer MenachemFriedlich MoriahKaplan NoamLinial Michal<p>Abstract</p> <p>Background</p> <p>It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity.</p> <p>Results</p> <p>In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust.</p> <p>Conclusions</p> <p>We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.</p> http://www.biomedcentral.com/1471-2105/5/196
collection DOAJ
language English
format Article
sources DOAJ
author Fromer Menachem
Friedlich Moriah
Kaplan Noam
Linial Michal
spellingShingle Fromer Menachem
Friedlich Moriah
Kaplan Noam
Linial Michal
A functional hierarchical organization of the protein sequence space
BMC Bioinformatics
author_facet Fromer Menachem
Friedlich Moriah
Kaplan Noam
Linial Michal
author_sort Fromer Menachem
title A functional hierarchical organization of the protein sequence space
title_short A functional hierarchical organization of the protein sequence space
title_full A functional hierarchical organization of the protein sequence space
title_fullStr A functional hierarchical organization of the protein sequence space
title_full_unstemmed A functional hierarchical organization of the protein sequence space
title_sort functional hierarchical organization of the protein sequence space
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2004-12-01
description <p>Abstract</p> <p>Background</p> <p>It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity.</p> <p>Results</p> <p>In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust.</p> <p>Conclusions</p> <p>We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.</p>
url http://www.biomedcentral.com/1471-2105/5/196
work_keys_str_mv AT fromermenachem afunctionalhierarchicalorganizationoftheproteinsequencespace
AT friedlichmoriah afunctionalhierarchicalorganizationoftheproteinsequencespace
AT kaplannoam afunctionalhierarchicalorganizationoftheproteinsequencespace
AT linialmichal afunctionalhierarchicalorganizationoftheproteinsequencespace
AT fromermenachem functionalhierarchicalorganizationoftheproteinsequencespace
AT friedlichmoriah functionalhierarchicalorganizationoftheproteinsequencespace
AT kaplannoam functionalhierarchicalorganizationoftheproteinsequencespace
AT linialmichal functionalhierarchicalorganizationoftheproteinsequencespace
_version_ 1725357410028093440