Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structu...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2009-01-01
|
Series: | PLoS Computational Biology |
Online Access: | http://europepmc.org/articles/PMC2621338?pdf=render |
id |
doaj-5d84689868734389bcb58dfa80d2e594 |
---|---|
record_format |
Article |
spelling |
doaj-5d84689868734389bcb58dfa80d2e5942020-11-25T02:05:18ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582009-01-0151e100027810.1371/journal.pcbi.1000278Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.Mile SikićSanja TomićKristian VlahovicekIdentifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.http://europepmc.org/articles/PMC2621338?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mile Sikić Sanja Tomić Kristian Vlahovicek |
spellingShingle |
Mile Sikić Sanja Tomić Kristian Vlahovicek Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Computational Biology |
author_facet |
Mile Sikić Sanja Tomić Kristian Vlahovicek |
author_sort |
Mile Sikić |
title |
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. |
title_short |
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. |
title_full |
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. |
title_fullStr |
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. |
title_full_unstemmed |
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. |
title_sort |
prediction of protein-protein interaction sites in sequences and 3d structures by random forests. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2009-01-01 |
description |
Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. |
url |
http://europepmc.org/articles/PMC2621338?pdf=render |
work_keys_str_mv |
AT milesikic predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests AT sanjatomic predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests AT kristianvlahovicek predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests |
_version_ |
1724938869336440832 |