Large scale hierarchical clustering of protein sequences

<p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is...

Full description

Bibliographic Details
Main Authors: Stoye Jens, Krause Antje, Vingron Martin
Format: Article
Language:English
Published: BMC 2005-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/15
id doaj-faafca5eb54d4019bd9287d367d5b609
record_format Article
spelling doaj-faafca5eb54d4019bd9287d367d5b6092020-11-24T22:38:51ZengBMCBMC Bioinformatics1471-21052005-01-01611510.1186/1471-2105-6-15Large scale hierarchical clustering of protein sequencesStoye JensKrause AntjeVingron Martin<p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.</p> <p>Results</p> <p>We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at <url>http://systers.molgen.mpg.de/</url>.</p> <p>Conclusions</p> <p>Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.</p> http://www.biomedcentral.com/1471-2105/6/15
collection DOAJ
language English
format Article
sources DOAJ
author Stoye Jens
Krause Antje
Vingron Martin
spellingShingle Stoye Jens
Krause Antje
Vingron Martin
Large scale hierarchical clustering of protein sequences
BMC Bioinformatics
author_facet Stoye Jens
Krause Antje
Vingron Martin
author_sort Stoye Jens
title Large scale hierarchical clustering of protein sequences
title_short Large scale hierarchical clustering of protein sequences
title_full Large scale hierarchical clustering of protein sequences
title_fullStr Large scale hierarchical clustering of protein sequences
title_full_unstemmed Large scale hierarchical clustering of protein sequences
title_sort large scale hierarchical clustering of protein sequences
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-01-01
description <p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.</p> <p>Results</p> <p>We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at <url>http://systers.molgen.mpg.de/</url>.</p> <p>Conclusions</p> <p>Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.</p>
url http://www.biomedcentral.com/1471-2105/6/15
work_keys_str_mv AT stoyejens largescalehierarchicalclusteringofproteinsequences
AT krauseantje largescalehierarchicalclusteringofproteinsequences
AT vingronmartin largescalehierarchicalclusteringofproteinsequences
_version_ 1725711551419121664