Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices

<p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlat...

Full description

Bibliographic Details
Main Authors: Liao Li, Craig Roger A
Format: Article
Language:English
Published: BMC 2007-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/6
id doaj-7812ea037a5349d9a93aade49e986799
record_format Article
spelling doaj-7812ea037a5349d9a93aade49e9867992020-11-25T01:03:37ZengBMCBMC Bioinformatics1471-21052007-01-0181610.1186/1471-2105-8-6Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matricesLiao LiCraig Roger A<p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes.</p> <p>Results</p> <p>We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587).</p> <p>Conclusion</p> <p>We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions.</p> http://www.biomedcentral.com/1471-2105/8/6
collection DOAJ
language English
format Article
sources DOAJ
author Liao Li
Craig Roger A
spellingShingle Liao Li
Craig Roger A
Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
BMC Bioinformatics
author_facet Liao Li
Craig Roger A
author_sort Liao Li
title Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_short Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_full Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_fullStr Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_full_unstemmed Phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
title_sort phylogenetic tree information aids supervised learning for predicting protein-protein interaction based on distance matrices
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2007-01-01
description <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes.</p> <p>Results</p> <p>We proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587).</p> <p>Conclusion</p> <p>We have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions.</p>
url http://www.biomedcentral.com/1471-2105/8/6
work_keys_str_mv AT liaoli phylogenetictreeinformationaidssupervisedlearningforpredictingproteinproteininteractionbasedondistancematrices
AT craigrogera phylogenetictreeinformationaidssupervisedlearningforpredictingproteinproteininteractionbasedondistancematrices
_version_ 1725200269423149056