Ordering Protein Contact Matrices

Numerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here,...

Full description

Bibliographic Details
Main Authors: Chuan Xu, Guillaume Bouvier, Benjamin Bardiaux, Michael Nilges, Thérèse Malliavin, Abdel Lisser
Format: Article
Language:English
Published: Elsevier 2018-01-01
Series:Computational and Structural Biotechnology Journal
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037017301058
id doaj-b808286a6ee842859dd7c9cad1b709a9
record_format Article
spelling doaj-b808286a6ee842859dd7c9cad1b709a92020-11-25T01:18:41ZengElsevierComputational and Structural Biotechnology Journal2001-03702018-01-0116140156Ordering Protein Contact MatricesChuan Xu0Guillaume Bouvier1Benjamin Bardiaux2Michael Nilges3Thérèse Malliavin4Abdel Lisser5Laboratoire de Recherche en Informatique, Université Paris-Sud and CNRS UMR8623, FranceUnité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France; Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, FranceUnité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France; Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, FranceUnité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France; Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, FranceUnité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France; Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, France; Corresponding author at: Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, France.Laboratoire de Recherche en Informatique, Université Paris-Sud and CNRS UMR8623, FranceNumerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here, we propose an algorithm to assign such residue numbers by ordering the columns and lines of the raw protein contact matrix directly obtained from proximity information between unassigned amino acids. The ordering problem is formatted as the search of a trail within a graph connecting protein residues through the nonzero contact values. The algorithm performs in two steps: (i) finding the longest trail of the graph using an original dynamic programming algorithm, (ii) clustering the individual ordered matrices using a self-organizing map (SOM) approach. The combination of the dynamic programming and self-organizing map approaches constitutes a quite innovative point of the present work. The algorithm was validated on a set of about 900 proteins, representative of the sizes and proportions of secondary structures observed in the Protein Data Bank. The algorithm was revealed to be efficient for noise levels up to 40%, obtaining average gaps of about 20% at maximum between ordered and initial matrices. The proposed approach paves the ways toward a method of fold prediction from noisy proximity information, as TM scores larger than 0.5 have been obtained for ten randomly chosen proteins, in the case of a noise level of 10%. The methods has been also validated on two experimental cases, on which it performed satisfactorily. Keywords: Protein contact matrix, Fold prediction, Graph theory, Dynamic programming, Self-organizing maphttp://www.sciencedirect.com/science/article/pii/S2001037017301058
collection DOAJ
language English
format Article
sources DOAJ
author Chuan Xu
Guillaume Bouvier
Benjamin Bardiaux
Michael Nilges
Thérèse Malliavin
Abdel Lisser
spellingShingle Chuan Xu
Guillaume Bouvier
Benjamin Bardiaux
Michael Nilges
Thérèse Malliavin
Abdel Lisser
Ordering Protein Contact Matrices
Computational and Structural Biotechnology Journal
author_facet Chuan Xu
Guillaume Bouvier
Benjamin Bardiaux
Michael Nilges
Thérèse Malliavin
Abdel Lisser
author_sort Chuan Xu
title Ordering Protein Contact Matrices
title_short Ordering Protein Contact Matrices
title_full Ordering Protein Contact Matrices
title_fullStr Ordering Protein Contact Matrices
title_full_unstemmed Ordering Protein Contact Matrices
title_sort ordering protein contact matrices
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2018-01-01
description Numerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here, we propose an algorithm to assign such residue numbers by ordering the columns and lines of the raw protein contact matrix directly obtained from proximity information between unassigned amino acids. The ordering problem is formatted as the search of a trail within a graph connecting protein residues through the nonzero contact values. The algorithm performs in two steps: (i) finding the longest trail of the graph using an original dynamic programming algorithm, (ii) clustering the individual ordered matrices using a self-organizing map (SOM) approach. The combination of the dynamic programming and self-organizing map approaches constitutes a quite innovative point of the present work. The algorithm was validated on a set of about 900 proteins, representative of the sizes and proportions of secondary structures observed in the Protein Data Bank. The algorithm was revealed to be efficient for noise levels up to 40%, obtaining average gaps of about 20% at maximum between ordered and initial matrices. The proposed approach paves the ways toward a method of fold prediction from noisy proximity information, as TM scores larger than 0.5 have been obtained for ten randomly chosen proteins, in the case of a noise level of 10%. The methods has been also validated on two experimental cases, on which it performed satisfactorily. Keywords: Protein contact matrix, Fold prediction, Graph theory, Dynamic programming, Self-organizing map
url http://www.sciencedirect.com/science/article/pii/S2001037017301058
work_keys_str_mv AT chuanxu orderingproteincontactmatrices
AT guillaumebouvier orderingproteincontactmatrices
AT benjaminbardiaux orderingproteincontactmatrices
AT michaelnilges orderingproteincontactmatrices
AT theresemalliavin orderingproteincontactmatrices
AT abdellisser orderingproteincontactmatrices
_version_ 1725141153369554944