Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.

Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From t...

Full description

Bibliographic Details
Main Author: Suyu Mei
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3832534?pdf=render
id doaj-06ec6b396a1f422bb90314cf9f3c09ba
record_format Article
spelling doaj-06ec6b396a1f422bb90314cf9f3c09ba2020-11-24T21:16:19ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-01811e7960610.1371/journal.pone.0079606Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.Suyu MeiReconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.http://europepmc.org/articles/PMC3832534?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Suyu Mei
spellingShingle Suyu Mei
Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.
PLoS ONE
author_facet Suyu Mei
author_sort Suyu Mei
title Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.
title_short Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.
title_full Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.
title_fullStr Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.
title_full_unstemmed Probability weighted ensemble transfer learning for predicting interactions between HIV-1 and human proteins.
title_sort probability weighted ensemble transfer learning for predicting interactions between hiv-1 and human proteins.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.
url http://europepmc.org/articles/PMC3832534?pdf=render
work_keys_str_mv AT suyumei probabilityweightedensembletransferlearningforpredictinginteractionsbetweenhiv1andhumanproteins
_version_ 1726016073331900416