|
|
|
|
LEADER |
02777nam a2200349Ia 4500 |
001 |
10.1371-journal.pcbi.1009986 |
008 |
220425s2022 CNT 000 0 und d |
020 |
|
|
|a 1553734X (ISSN)
|
245 |
1 |
0 |
|a Fast protein structure comparison through effective representation learning with contrastive graph neural networks
|
260 |
|
0 |
|b Public Library of Science
|c 2022
|
856 |
|
|
|z View Fulltext in Publisher
|u https://doi.org/10.1371/journal.pcbi.1009986
|
520 |
3 |
|
|a Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use. Copyright: © 2022 Xia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
|
650 |
0 |
4 |
|a algorithm
|
650 |
0 |
4 |
|a Algorithms
|
650 |
0 |
4 |
|a article
|
650 |
0 |
4 |
|a feature learning (machine learning)
|
650 |
0 |
4 |
|a learning
|
650 |
0 |
4 |
|a Learning
|
650 |
0 |
4 |
|a Neural Networks, Computer
|
650 |
0 |
4 |
|a performance indicator
|
650 |
0 |
4 |
|a protein
|
650 |
0 |
4 |
|a protein structure
|
650 |
0 |
4 |
|a protein tertiary structure
|
650 |
0 |
4 |
|a Proteins
|
650 |
0 |
4 |
|a software
|
650 |
0 |
4 |
|a Software
|
700 |
1 |
|
|a Feng, S.-H.
|e author
|
700 |
1 |
|
|a Pan, X.
|e author
|
700 |
1 |
|
|a Shen, H.-B.
|e author
|
700 |
1 |
|
|a Xia, C.
|e author
|
700 |
1 |
|
|a Xia, Y.
|e author
|
773 |
|
|
|t PLoS Computational Biology
|