Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.

BACKGROUND: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectivel...

Full description

Bibliographic Details
Main Authors: Swati Kaushik, Eshita Mutt, Ajithavalli Chellappan, Sandhya Sankaran, Narayanaswamy Srinivasan, Ramanathan Sowdhamini
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3577913?pdf=render
id doaj-4faa52626a6a45a59270e5ddfbe528af
record_format Article
spelling doaj-4faa52626a6a45a59270e5ddfbe528af2020-11-24T21:41:55ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0182e5644910.1371/journal.pone.0056449Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.Swati KaushikEshita MuttAjithavalli ChellappanSandhya SankaranNarayanaswamy SrinivasanRamanathan SowdhaminiBACKGROUND: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. METHODOLOGY/PRINCIPAL FINDINGS: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ~100% and Mathew's correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. CONCLUSIONS/SIGNIFICANCE: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the 'bridging' role of related families.http://europepmc.org/articles/PMC3577913?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Swati Kaushik
Eshita Mutt
Ajithavalli Chellappan
Sandhya Sankaran
Narayanaswamy Srinivasan
Ramanathan Sowdhamini
spellingShingle Swati Kaushik
Eshita Mutt
Ajithavalli Chellappan
Sandhya Sankaran
Narayanaswamy Srinivasan
Ramanathan Sowdhamini
Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
PLoS ONE
author_facet Swati Kaushik
Eshita Mutt
Ajithavalli Chellappan
Sandhya Sankaran
Narayanaswamy Srinivasan
Ramanathan Sowdhamini
author_sort Swati Kaushik
title Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
title_short Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
title_full Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
title_fullStr Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
title_full_unstemmed Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage.
title_sort improved detection of remote homologues using cascade psi-blast: influence of neighbouring protein families on sequence coverage.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description BACKGROUND: Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. METHODOLOGY/PRINCIPAL FINDINGS: We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ~100% and Mathew's correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. CONCLUSIONS/SIGNIFICANCE: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the 'bridging' role of related families.
url http://europepmc.org/articles/PMC3577913?pdf=render
work_keys_str_mv AT swatikaushik improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT eshitamutt improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT ajithavallichellappan improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT sandhyasankaran improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT narayanaswamysrinivasan improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
AT ramanathansowdhamini improveddetectionofremotehomologuesusingcascadepsiblastinfluenceofneighbouringproteinfamiliesonsequencecoverage
_version_ 1725919946175676416