FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challen...

Full description

Bibliographic Details
Main Authors: Yasser El-Manzalawy, Mostafa Abbas, Qutaibah Malluhi, Vasant Honavar
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4934694?pdf=render
id doaj-527b13f4167b475b9dd85cd454a46d6d
record_format Article
spelling doaj-527b13f4167b475b9dd85cd454a46d6d2020-11-24T20:50:15ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01117e015844510.1371/journal.pone.0158445FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.Yasser El-ManzalawyMostafa AbbasQutaibah MalluhiVasant HonavarA wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.http://europepmc.org/articles/PMC4934694?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Yasser El-Manzalawy
Mostafa Abbas
Qutaibah Malluhi
Vasant Honavar
spellingShingle Yasser El-Manzalawy
Mostafa Abbas
Qutaibah Malluhi
Vasant Honavar
FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
PLoS ONE
author_facet Yasser El-Manzalawy
Mostafa Abbas
Qutaibah Malluhi
Vasant Honavar
author_sort Yasser El-Manzalawy
title FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
title_short FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
title_full FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
title_fullStr FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
title_full_unstemmed FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
title_sort fastrnabindr: fast and accurate prediction of protein-rna interface residues.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2016-01-01
description A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
url http://europepmc.org/articles/PMC4934694?pdf=render
work_keys_str_mv AT yasserelmanzalawy fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues
AT mostafaabbas fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues
AT qutaibahmalluhi fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues
AT vasanthonavar fastrnabindrfastandaccuratepredictionofproteinrnainterfaceresidues
_version_ 1716804282994393088