Calpain cleavage prediction using multiple kernel learning.

Calpain, an intracellular Ca²⁺-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleav...

Full description

Bibliographic Details
Main Authors: David A DuVerle, Yasuko Ono, Hiroyuki Sorimachi, Hiroshi Mamitsuka
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3086883?pdf=render
id doaj-a2df3af651314009808a9e0aae3c4943
record_format Article
spelling doaj-a2df3af651314009808a9e0aae3c49432020-11-25T02:10:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-0165e1903510.1371/journal.pone.0019035Calpain cleavage prediction using multiple kernel learning.David A DuVerleYasuko OnoHiroyuki SorimachiHiroshi MamitsukaCalpain, an intracellular Ca²⁺-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.http://europepmc.org/articles/PMC3086883?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author David A DuVerle
Yasuko Ono
Hiroyuki Sorimachi
Hiroshi Mamitsuka
spellingShingle David A DuVerle
Yasuko Ono
Hiroyuki Sorimachi
Hiroshi Mamitsuka
Calpain cleavage prediction using multiple kernel learning.
PLoS ONE
author_facet David A DuVerle
Yasuko Ono
Hiroyuki Sorimachi
Hiroshi Mamitsuka
author_sort David A DuVerle
title Calpain cleavage prediction using multiple kernel learning.
title_short Calpain cleavage prediction using multiple kernel learning.
title_full Calpain cleavage prediction using multiple kernel learning.
title_fullStr Calpain cleavage prediction using multiple kernel learning.
title_full_unstemmed Calpain cleavage prediction using multiple kernel learning.
title_sort calpain cleavage prediction using multiple kernel learning.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2011-01-01
description Calpain, an intracellular Ca²⁺-dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org.
url http://europepmc.org/articles/PMC3086883?pdf=render
work_keys_str_mv AT davidaduverle calpaincleavagepredictionusingmultiplekernellearning
AT yasukoono calpaincleavagepredictionusingmultiplekernellearning
AT hiroyukisorimachi calpaincleavagepredictionusingmultiplekernellearning
AT hiroshimamitsuka calpaincleavagepredictionusingmultiplekernellearning
_version_ 1724919289990873088