Automatic classification of protein structures relying on similarities between alignments

<p>Abstract</p> <p>Background</p> <p>Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clu...

Full description

Bibliographic Details
Main Authors: Santini Guillaume, Soldano Henry, Pothier Joël
Format: Article
Language:English
Published: BMC 2012-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/13/233
id doaj-95289dd64ac44c6b98e799bca9f13862
record_format Article
spelling doaj-95289dd64ac44c6b98e799bca9f138622020-11-24T23:17:12ZengBMCBMC Bioinformatics1471-21052012-09-0113123310.1186/1471-2105-13-233Automatic classification of protein structures relying on similarities between alignmentsSantini GuillaumeSoldano HenryPothier Joël<p>Abstract</p> <p>Background</p> <p>Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins.</p> <p>Results</p> <p>When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a <it>ternary similarity</it> on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments.</p> <p>Conclusions</p> <p>We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.</p> http://www.biomedcentral.com/1471-2105/13/233
collection DOAJ
language English
format Article
sources DOAJ
author Santini Guillaume
Soldano Henry
Pothier Joël
spellingShingle Santini Guillaume
Soldano Henry
Pothier Joël
Automatic classification of protein structures relying on similarities between alignments
BMC Bioinformatics
author_facet Santini Guillaume
Soldano Henry
Pothier Joël
author_sort Santini Guillaume
title Automatic classification of protein structures relying on similarities between alignments
title_short Automatic classification of protein structures relying on similarities between alignments
title_full Automatic classification of protein structures relying on similarities between alignments
title_fullStr Automatic classification of protein structures relying on similarities between alignments
title_full_unstemmed Automatic classification of protein structures relying on similarities between alignments
title_sort automatic classification of protein structures relying on similarities between alignments
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2012-09-01
description <p>Abstract</p> <p>Background</p> <p>Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins.</p> <p>Results</p> <p>When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a <it>ternary similarity</it> on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments.</p> <p>Conclusions</p> <p>We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.</p>
url http://www.biomedcentral.com/1471-2105/13/233
work_keys_str_mv AT santiniguillaume automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments
AT soldanohenry automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments
AT pothierjoel automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments
_version_ 1725584307876003840