Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structu...

Full description

Bibliographic Details
Main Authors: Mile Sikić, Sanja Tomić, Kristian Vlahovicek
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-01-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC2621338?pdf=render
id doaj-5d84689868734389bcb58dfa80d2e594
record_format Article
spelling doaj-5d84689868734389bcb58dfa80d2e5942020-11-25T02:05:18ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582009-01-0151e100027810.1371/journal.pcbi.1000278Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.Mile SikićSanja TomićKristian VlahovicekIdentifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.http://europepmc.org/articles/PMC2621338?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Mile Sikić
Sanja Tomić
Kristian Vlahovicek
spellingShingle Mile Sikić
Sanja Tomić
Kristian Vlahovicek
Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
PLoS Computational Biology
author_facet Mile Sikić
Sanja Tomić
Kristian Vlahovicek
author_sort Mile Sikić
title Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
title_short Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
title_full Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
title_fullStr Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
title_full_unstemmed Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.
title_sort prediction of protein-protein interaction sites in sequences and 3d structures by random forests.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2009-01-01
description Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.
url http://europepmc.org/articles/PMC2621338?pdf=render
work_keys_str_mv AT milesikic predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests
AT sanjatomic predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests
AT kristianvlahovicek predictionofproteinproteininteractionsitesinsequencesand3dstructuresbyrandomforests
_version_ 1724938869336440832