A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods

RNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein s...

Full description

Bibliographic Details
Main Authors: Jiazhi Song, Guixia Liu, Rongquan Wang, Liyan Sun, Ping Zhang
Format: Article
Language:English
Published: Taylor & Francis Group 2019-01-01
Series:Biotechnology & Biotechnological Equipment
Subjects:
Online Access:http://dx.doi.org/10.1080/13102818.2019.1612275
id doaj-478b211948e84bc09d2d83e276dd6506
record_format Article
spelling doaj-478b211948e84bc09d2d83e276dd65062020-11-24T21:40:09ZengTaylor & Francis GroupBiotechnology & Biotechnological Equipment1310-28181314-35302019-01-013311138114910.1080/13102818.2019.16122751612275A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methodsJiazhi Song0Guixia Liu1Rongquan Wang2Liyan Sun3Ping Zhang4Jilin UniversityJilin UniversityJilin UniversityJilin UniversityJilin UniversityRNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein sequence as basic information has been widely used in many protein researches with the combination of machine learning techniques. Here, we propose a sequence-based method to predict the RNA-protein interacting residues in protein sequences. The prediction method is composed of two predictors including a feature-based predictor and a sequence template-based predictor. The feature-based predictor applies the random forest (RF) classifier with the protein sequence information. After getting the classification probability, an adjustment procedure is used in consideration of neighbouring correlation between RNA interacting residues. The sequence template-based predictor selects the optimal template of the query sequence by multiple sequence alignment and matches the interacting residues in template sequence into the query sequence. With the combination of two predictors, the coverage and prediction performance of our method have been greatly improved, the MCC value increases from 0.467 and 0.352 to 0.499 in our validation set. In order to evaluate our proposed method, an independent testing set is utilized to compare with other two hybrid methods. As a result, our method achieves better performance than the other two methods with an overall accuracy of 0.817, an MCC value of 0.511 and an F-score of 0.605, which demonstrates that our method can reliably predict the RNA interacting residues in protein sequences. Moreover, the effectiveness of our newly proposed adjustment procedure in the feature-based predictor is examined and analyzed in detail.http://dx.doi.org/10.1080/13102818.2019.1612275rna-interacting residuesproteinensemble learningrandom forest
collection DOAJ
language English
format Article
sources DOAJ
author Jiazhi Song
Guixia Liu
Rongquan Wang
Liyan Sun
Ping Zhang
spellingShingle Jiazhi Song
Guixia Liu
Rongquan Wang
Liyan Sun
Ping Zhang
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
Biotechnology & Biotechnological Equipment
rna-interacting residues
protein
ensemble learning
random forest
author_facet Jiazhi Song
Guixia Liu
Rongquan Wang
Liyan Sun
Ping Zhang
author_sort Jiazhi Song
title A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
title_short A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
title_full A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
title_fullStr A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
title_full_unstemmed A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
title_sort novel method for predicting rna-interacting residues in proteins using a combination of feature-based and sequence template-based methods
publisher Taylor & Francis Group
series Biotechnology & Biotechnological Equipment
issn 1310-2818
1314-3530
publishDate 2019-01-01
description RNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein sequence as basic information has been widely used in many protein researches with the combination of machine learning techniques. Here, we propose a sequence-based method to predict the RNA-protein interacting residues in protein sequences. The prediction method is composed of two predictors including a feature-based predictor and a sequence template-based predictor. The feature-based predictor applies the random forest (RF) classifier with the protein sequence information. After getting the classification probability, an adjustment procedure is used in consideration of neighbouring correlation between RNA interacting residues. The sequence template-based predictor selects the optimal template of the query sequence by multiple sequence alignment and matches the interacting residues in template sequence into the query sequence. With the combination of two predictors, the coverage and prediction performance of our method have been greatly improved, the MCC value increases from 0.467 and 0.352 to 0.499 in our validation set. In order to evaluate our proposed method, an independent testing set is utilized to compare with other two hybrid methods. As a result, our method achieves better performance than the other two methods with an overall accuracy of 0.817, an MCC value of 0.511 and an F-score of 0.605, which demonstrates that our method can reliably predict the RNA interacting residues in protein sequences. Moreover, the effectiveness of our newly proposed adjustment procedure in the feature-based predictor is examined and analyzed in detail.
topic rna-interacting residues
protein
ensemble learning
random forest
url http://dx.doi.org/10.1080/13102818.2019.1612275
work_keys_str_mv AT jiazhisong anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT guixialiu anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT rongquanwang anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT liyansun anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT pingzhang anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT jiazhisong novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT guixialiu novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT rongquanwang novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT liyansun novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
AT pingzhang novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods
_version_ 1725927778381987840