A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
RNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein s...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2019-01-01
|
Series: | Biotechnology & Biotechnological Equipment |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/13102818.2019.1612275 |
id |
doaj-478b211948e84bc09d2d83e276dd6506 |
---|---|
record_format |
Article |
spelling |
doaj-478b211948e84bc09d2d83e276dd65062020-11-24T21:40:09ZengTaylor & Francis GroupBiotechnology & Biotechnological Equipment1310-28181314-35302019-01-013311138114910.1080/13102818.2019.16122751612275A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methodsJiazhi Song0Guixia Liu1Rongquan Wang2Liyan Sun3Ping Zhang4Jilin UniversityJilin UniversityJilin UniversityJilin UniversityJilin UniversityRNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein sequence as basic information has been widely used in many protein researches with the combination of machine learning techniques. Here, we propose a sequence-based method to predict the RNA-protein interacting residues in protein sequences. The prediction method is composed of two predictors including a feature-based predictor and a sequence template-based predictor. The feature-based predictor applies the random forest (RF) classifier with the protein sequence information. After getting the classification probability, an adjustment procedure is used in consideration of neighbouring correlation between RNA interacting residues. The sequence template-based predictor selects the optimal template of the query sequence by multiple sequence alignment and matches the interacting residues in template sequence into the query sequence. With the combination of two predictors, the coverage and prediction performance of our method have been greatly improved, the MCC value increases from 0.467 and 0.352 to 0.499 in our validation set. In order to evaluate our proposed method, an independent testing set is utilized to compare with other two hybrid methods. As a result, our method achieves better performance than the other two methods with an overall accuracy of 0.817, an MCC value of 0.511 and an F-score of 0.605, which demonstrates that our method can reliably predict the RNA interacting residues in protein sequences. Moreover, the effectiveness of our newly proposed adjustment procedure in the feature-based predictor is examined and analyzed in detail.http://dx.doi.org/10.1080/13102818.2019.1612275rna-interacting residuesproteinensemble learningrandom forest |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jiazhi Song Guixia Liu Rongquan Wang Liyan Sun Ping Zhang |
spellingShingle |
Jiazhi Song Guixia Liu Rongquan Wang Liyan Sun Ping Zhang A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods Biotechnology & Biotechnological Equipment rna-interacting residues protein ensemble learning random forest |
author_facet |
Jiazhi Song Guixia Liu Rongquan Wang Liyan Sun Ping Zhang |
author_sort |
Jiazhi Song |
title |
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods |
title_short |
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods |
title_full |
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods |
title_fullStr |
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods |
title_full_unstemmed |
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods |
title_sort |
novel method for predicting rna-interacting residues in proteins using a combination of feature-based and sequence template-based methods |
publisher |
Taylor & Francis Group |
series |
Biotechnology & Biotechnological Equipment |
issn |
1310-2818 1314-3530 |
publishDate |
2019-01-01 |
description |
RNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein sequence as basic information has been widely used in many protein researches with the combination of machine learning techniques. Here, we propose a sequence-based method to predict the RNA-protein interacting residues in protein sequences. The prediction method is composed of two predictors including a feature-based predictor and a sequence template-based predictor. The feature-based predictor applies the random forest (RF) classifier with the protein sequence information. After getting the classification probability, an adjustment procedure is used in consideration of neighbouring correlation between RNA interacting residues. The sequence template-based predictor selects the optimal template of the query sequence by multiple sequence alignment and matches the interacting residues in template sequence into the query sequence. With the combination of two predictors, the coverage and prediction performance of our method have been greatly improved, the MCC value increases from 0.467 and 0.352 to 0.499 in our validation set. In order to evaluate our proposed method, an independent testing set is utilized to compare with other two hybrid methods. As a result, our method achieves better performance than the other two methods with an overall accuracy of 0.817, an MCC value of 0.511 and an F-score of 0.605, which demonstrates that our method can reliably predict the RNA interacting residues in protein sequences. Moreover, the effectiveness of our newly proposed adjustment procedure in the feature-based predictor is examined and analyzed in detail. |
topic |
rna-interacting residues protein ensemble learning random forest |
url |
http://dx.doi.org/10.1080/13102818.2019.1612275 |
work_keys_str_mv |
AT jiazhisong anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT guixialiu anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT rongquanwang anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT liyansun anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT pingzhang anovelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT jiazhisong novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT guixialiu novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT rongquanwang novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT liyansun novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods AT pingzhang novelmethodforpredictingrnainteractingresiduesinproteinsusingacombinationoffeaturebasedandsequencetemplatebasedmethods |
_version_ |
1725927778381987840 |