Large scale hierarchical clustering of protein sequences
<p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2005-01-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/6/15 |
Summary: | <p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.</p> <p>Results</p> <p>We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at <url>http://systers.molgen.mpg.de/</url>.</p> <p>Conclusions</p> <p>Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.</p> |
---|---|
ISSN: | 1471-2105 |