Prediction of protein cleavage site with feature selection by random forest.

Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Iden...

Full description

Bibliographic Details
Main Authors: Bi-Qing Li, Yu-Dong Cai, Kai-Yan Feng, Gui-Jun Zhao
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3445488?pdf=render
id doaj-c666a553d6244283bd83e94b79dd9499
record_format Article
spelling doaj-c666a553d6244283bd83e94b79dd94992020-11-25T01:21:31ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0179e4585410.1371/journal.pone.0045854Prediction of protein cleavage site with feature selection by random forest.Bi-Qing LiYu-Dong CaiKai-Yan FengGui-Jun ZhaoProteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.http://europepmc.org/articles/PMC3445488?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Bi-Qing Li
Yu-Dong Cai
Kai-Yan Feng
Gui-Jun Zhao
spellingShingle Bi-Qing Li
Yu-Dong Cai
Kai-Yan Feng
Gui-Jun Zhao
Prediction of protein cleavage site with feature selection by random forest.
PLoS ONE
author_facet Bi-Qing Li
Yu-Dong Cai
Kai-Yan Feng
Gui-Jun Zhao
author_sort Bi-Qing Li
title Prediction of protein cleavage site with feature selection by random forest.
title_short Prediction of protein cleavage site with feature selection by random forest.
title_full Prediction of protein cleavage site with feature selection by random forest.
title_fullStr Prediction of protein cleavage site with feature selection by random forest.
title_full_unstemmed Prediction of protein cleavage site with feature selection by random forest.
title_sort prediction of protein cleavage site with feature selection by random forest.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2012-01-01
description Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.
url http://europepmc.org/articles/PMC3445488?pdf=render
work_keys_str_mv AT biqingli predictionofproteincleavagesitewithfeatureselectionbyrandomforest
AT yudongcai predictionofproteincleavagesitewithfeatureselectionbyrandomforest
AT kaiyanfeng predictionofproteincleavagesitewithfeatureselectionbyrandomforest
AT guijunzhao predictionofproteincleavagesitewithfeatureselectionbyrandomforest
_version_ 1725129797754945536