Automatic classification of protein structures relying on similarities between alignments

Abstract Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clu...

Full description

Bibliographic Details
Main Authors:	Santini Guillaume, Soldano Henry, Pothier Joël
Format:	Article
Language:	English
Published:	BMC 2012-09-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/13/233

id	doaj-95289dd64ac44c6b98e799bca9f13862
record_format	Article
spelling	doaj-95289dd64ac44c6b98e799bca9f138622020-11-24T23:17:12ZengBMCBMC Bioinformatics1471-21052012-09-0113123310.1186/1471-2105-13-233Automatic classification of protein structures relying on similarities between alignmentsSantini GuillaumeSoldano HenryPothier Joël<p>Abstract</p> <p>Background</p> <p>Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins.</p> <p>Results</p> <p>When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a <it>ternary similarity</it> on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments.</p> <p>Conclusions</p> <p>We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.</p> http://www.biomedcentral.com/1471-2105/13/233
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Santini Guillaume Soldano Henry Pothier Joël
spellingShingle	Santini Guillaume Soldano Henry Pothier Joël Automatic classification of protein structures relying on similarities between alignments BMC Bioinformatics
author_facet	Santini Guillaume Soldano Henry Pothier Joël
author_sort	Santini Guillaume
title	Automatic classification of protein structures relying on similarities between alignments
title_short	Automatic classification of protein structures relying on similarities between alignments
title_full	Automatic classification of protein structures relying on similarities between alignments
title_fullStr	Automatic classification of protein structures relying on similarities between alignments
title_full_unstemmed	Automatic classification of protein structures relying on similarities between alignments
title_sort	automatic classification of protein structures relying on similarities between alignments
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2012-09-01
description	<p>Abstract</p> <p>Background</p> <p>Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins.</p> <p>Results</p> <p>When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a <it>ternary similarity</it> on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments.</p> <p>Conclusions</p> <p>We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP.</p>
url	http://www.biomedcentral.com/1471-2105/13/233
work_keys_str_mv	AT santiniguillaume automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments AT soldanohenry automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments AT pothierjoel automaticclassificationofproteinstructuresrelyingonsimilaritiesbetweenalignments
_version_	1725584307876003840

Automatic classification of protein structures relying on similarities between alignments

Similar Items