Developing a novel method for homology detection of transmembrane proteins

Analysis of the complete genomic sequences for several organisms indicates that 20-25% of all genes code for transmembrane proteins (Jones, 1998, Wallin and von Heijne, 1998), yet only a very small number of transmembrane 3D structures are known. Hence, it is of great importance to develop theoretic...

Full description

Bibliographic Details
Main Author: Hurwitz, N.
Published: University College London (University of London) 2013
Subjects:
004
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.626090
Description
Summary:Analysis of the complete genomic sequences for several organisms indicates that 20-25% of all genes code for transmembrane proteins (Jones, 1998, Wallin and von Heijne, 1998), yet only a very small number of transmembrane 3D structures are known. Hence, it is of great importance to develop theoretical methods capable of predicting transmembrane protein structure and function based on protein sequence alone. To address this, we sought to devise a systematic and high throughput method for identifying homologous transmembrane proteins. Since protein structure is more evolutionarily conserved than amino acid sequence, we predicted that adding structural information to simple sequence alignment would improve homology detection of transmembrane proteins. In the present work, we describe development of a search method that combines sequence alignment with structural information. In our method the initial sequence alignment searches are performed using PSI-BLAST. Then profiles derived from the multiple sequence alignments are input into a neural network, developed in this work to predict which transmembrane residues are buried (core of the helix-bundle) or exposed (to the lipid environment). A maximum accuracy of 86% was achieved. Moreover, for almost half of the query set, the predicted residue orientation was more than 70% accurate. In the last step of the work presented here, the predicted helix locations, residue orientations and loop length scores are added to the PSI-BLAST E-value, to create a ‘combined’ classifier. A linear equation was built for calculating the 'combined’ classifier score. Our method was evaluated using two databases of proteins: Pfam and GPCRDB. The Pfam database was chosen, as transmembrane proteins in this database have been classified into various families. GPCRDB was employed as this database, though narrow, is well-studied and maintained. Before building the ‘combined’ classifier, PSI-BLAST sequence alignment was benchmarked using the Pfam database. We found that our 'combined’ classifier, as compared to a classifier based solely on PSI-BLAST, resulted in more true positives with less false positives when tested using GPCRDB and could differentiate between GPCRDB families. However, our ‘combined’ classifier did not improve homology detection when searching transmembrane proteins from the Pfam database. A comparison of our ‘combined’ classifier method with two other published methods suggested that profile-profile based searches could be more powerful than profile-sequence based searches, even after the addition of structural information as described here. In light of our study, we propose that combining structural information with profile-profile sequence alignment into a 'combined’ classifier could result in a search method superior to any existing ones for detecting homologous transmembrane proteins.