Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity

Effective prediction of protein tertiary structure from sequence is an important and challenging problem in computational structural biology. Ab initio protein structure prediction is based on amino acid sequence alone, thus, it has a wide application area. With the ab initio method, a large number...

Full description

Bibliographic Details
Main Authors: Xu Han, Li Li, Yonggang Lu
Format: Article
Language:English
Published: MDPI AG 2019-02-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/10/2/132
id doaj-730d748de4554db2bc7bb1597402a761
record_format Article
spelling doaj-730d748de4554db2bc7bb1597402a7612020-11-24T23:30:54ZengMDPI AGGenes2073-44252019-02-0110213210.3390/genes10020132genes10020132Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree SimilarityXu Han0Li Li1Yonggang Lu2School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, ChinaSchool of Information Science and Engineering, Lanzhou University, Lanzhou 730000, ChinaSchool of Information Science and Engineering, Lanzhou University, Lanzhou 730000, ChinaEffective prediction of protein tertiary structure from sequence is an important and challenging problem in computational structural biology. Ab initio protein structure prediction is based on amino acid sequence alone, thus, it has a wide application area. With the ab initio method, a large number of candidate protein structures called decoy set can be predicted, however, it is a difficult problem to select a good near-native structure from the predicted decoy set. In this work we propose a new method for selecting the near-native structure from the decoy set based on both contact map overlap (CMO) and graphlets. By generalizing graphlets to ordered graphs, and using a dynamic programming to select the optimal alignment with an introduced gap penalty, a GR_score is defined for calculating the similarity between the three-dimensional (3D) decoy structures. The proposed method was applied to all 54 single-domain targets in CASP11 and all 43 targets in CASP10, and ensemble clustering was used to cluster the protein decoy structures based on the computed CR_scores. The most popular centroid structure was selected as the near-native structure. The experiments showed that compared to the SPICKER method, which is used in I-TASSER, the proposed method can usually select better near-native structures in terms of the similarity between the selected structure and the true native structure.https://www.mdpi.com/2073-4425/10/2/132GR_scoredynamic programminggap penaltynear-native proteinprotein structure prediction
collection DOAJ
language English
format Article
sources DOAJ
author Xu Han
Li Li
Yonggang Lu
spellingShingle Xu Han
Li Li
Yonggang Lu
Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
Genes
GR_score
dynamic programming
gap penalty
near-native protein
protein structure prediction
author_facet Xu Han
Li Li
Yonggang Lu
author_sort Xu Han
title Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
title_short Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
title_full Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
title_fullStr Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
title_full_unstemmed Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
title_sort selecting near-native protein structures from predicted decoy sets using ordered graphlet degree similarity
publisher MDPI AG
series Genes
issn 2073-4425
publishDate 2019-02-01
description Effective prediction of protein tertiary structure from sequence is an important and challenging problem in computational structural biology. Ab initio protein structure prediction is based on amino acid sequence alone, thus, it has a wide application area. With the ab initio method, a large number of candidate protein structures called decoy set can be predicted, however, it is a difficult problem to select a good near-native structure from the predicted decoy set. In this work we propose a new method for selecting the near-native structure from the decoy set based on both contact map overlap (CMO) and graphlets. By generalizing graphlets to ordered graphs, and using a dynamic programming to select the optimal alignment with an introduced gap penalty, a GR_score is defined for calculating the similarity between the three-dimensional (3D) decoy structures. The proposed method was applied to all 54 single-domain targets in CASP11 and all 43 targets in CASP10, and ensemble clustering was used to cluster the protein decoy structures based on the computed CR_scores. The most popular centroid structure was selected as the near-native structure. The experiments showed that compared to the SPICKER method, which is used in I-TASSER, the proposed method can usually select better near-native structures in terms of the similarity between the selected structure and the true native structure.
topic GR_score
dynamic programming
gap penalty
near-native protein
protein structure prediction
url https://www.mdpi.com/2073-4425/10/2/132
work_keys_str_mv AT xuhan selectingnearnativeproteinstructuresfrompredicteddecoysetsusingorderedgraphletdegreesimilarity
AT lili selectingnearnativeproteinstructuresfrompredicteddecoysetsusingorderedgraphletdegreesimilarity
AT yongganglu selectingnearnativeproteinstructuresfrompredicteddecoysetsusingorderedgraphletdegreesimilarity
_version_ 1725539763677560832