Large scale hierarchical clustering of protein sequences
<p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2005-01-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/6/15 |
id |
doaj-faafca5eb54d4019bd9287d367d5b609 |
---|---|
record_format |
Article |
spelling |
doaj-faafca5eb54d4019bd9287d367d5b6092020-11-24T22:38:51ZengBMCBMC Bioinformatics1471-21052005-01-01611510.1186/1471-2105-6-15Large scale hierarchical clustering of protein sequencesStoye JensKrause AntjeVingron Martin<p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.</p> <p>Results</p> <p>We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at <url>http://systers.molgen.mpg.de/</url>.</p> <p>Conclusions</p> <p>Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.</p> http://www.biomedcentral.com/1471-2105/6/15 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Stoye Jens Krause Antje Vingron Martin |
spellingShingle |
Stoye Jens Krause Antje Vingron Martin Large scale hierarchical clustering of protein sequences BMC Bioinformatics |
author_facet |
Stoye Jens Krause Antje Vingron Martin |
author_sort |
Stoye Jens |
title |
Large scale hierarchical clustering of protein sequences |
title_short |
Large scale hierarchical clustering of protein sequences |
title_full |
Large scale hierarchical clustering of protein sequences |
title_fullStr |
Large scale hierarchical clustering of protein sequences |
title_full_unstemmed |
Large scale hierarchical clustering of protein sequences |
title_sort |
large scale hierarchical clustering of protein sequences |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2005-01-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.</p> <p>Results</p> <p>We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at <url>http://systers.molgen.mpg.de/</url>.</p> <p>Conclusions</p> <p>Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.</p> |
url |
http://www.biomedcentral.com/1471-2105/6/15 |
work_keys_str_mv |
AT stoyejens largescalehierarchicalclusteringofproteinsequences AT krauseantje largescalehierarchicalclusteringofproteinsequences AT vingronmartin largescalehierarchicalclusteringofproteinsequences |
_version_ |
1725711551419121664 |