Prediction of protein cleavage site with feature selection by random forest.
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Iden...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2012-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC3445488?pdf=render |
id |
doaj-c666a553d6244283bd83e94b79dd9499 |
---|---|
record_format |
Article |
spelling |
doaj-c666a553d6244283bd83e94b79dd94992020-11-25T01:21:31ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0179e4585410.1371/journal.pone.0045854Prediction of protein cleavage site with feature selection by random forest.Bi-Qing LiYu-Dong CaiKai-Yan FengGui-Jun ZhaoProteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.http://europepmc.org/articles/PMC3445488?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Bi-Qing Li Yu-Dong Cai Kai-Yan Feng Gui-Jun Zhao |
spellingShingle |
Bi-Qing Li Yu-Dong Cai Kai-Yan Feng Gui-Jun Zhao Prediction of protein cleavage site with feature selection by random forest. PLoS ONE |
author_facet |
Bi-Qing Li Yu-Dong Cai Kai-Yan Feng Gui-Jun Zhao |
author_sort |
Bi-Qing Li |
title |
Prediction of protein cleavage site with feature selection by random forest. |
title_short |
Prediction of protein cleavage site with feature selection by random forest. |
title_full |
Prediction of protein cleavage site with feature selection by random forest. |
title_fullStr |
Prediction of protein cleavage site with feature selection by random forest. |
title_full_unstemmed |
Prediction of protein cleavage site with feature selection by random forest. |
title_sort |
prediction of protein cleavage site with feature selection by random forest. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2012-01-01 |
description |
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases. |
url |
http://europepmc.org/articles/PMC3445488?pdf=render |
work_keys_str_mv |
AT biqingli predictionofproteincleavagesitewithfeatureselectionbyrandomforest AT yudongcai predictionofproteincleavagesitewithfeatureselectionbyrandomforest AT kaiyanfeng predictionofproteincleavagesitewithfeatureselectionbyrandomforest AT guijunzhao predictionofproteincleavagesitewithfeatureselectionbyrandomforest |
_version_ |
1725129797754945536 |