Fast protein structure comparison through effective representation learning with contrastive graph neural networks

Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we p...

Full description

Bibliographic Details
Main Authors: Feng, S.-H (Author), Pan, X. (Author), Shen, H.-B (Author), Xia, C. (Author), Xia, Y. (Author)
Format: Article
Language:English
Published: Public Library of Science 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02777nam a2200349Ia 4500
001 10.1371-journal.pcbi.1009986
008 220425s2022 CNT 000 0 und d
020 |a 1553734X (ISSN) 
245 1 0 |a Fast protein structure comparison through effective representation learning with contrastive graph neural networks 
260 0 |b Public Library of Science  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1371/journal.pcbi.1009986 
520 3 |a Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use. Copyright: © 2022 Xia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
650 0 4 |a algorithm 
650 0 4 |a Algorithms 
650 0 4 |a article 
650 0 4 |a feature learning (machine learning) 
650 0 4 |a learning 
650 0 4 |a Learning 
650 0 4 |a Neural Networks, Computer 
650 0 4 |a performance indicator 
650 0 4 |a protein 
650 0 4 |a protein structure 
650 0 4 |a protein tertiary structure 
650 0 4 |a Proteins 
650 0 4 |a software 
650 0 4 |a Software 
700 1 |a Feng, S.-H.  |e author 
700 1 |a Pan, X.  |e author 
700 1 |a Shen, H.-B.  |e author 
700 1 |a Xia, C.  |e author 
700 1 |a Xia, Y.  |e author 
773 |t PLoS Computational Biology