Protocols to capture the functional plasticity of protein domain superfamilies
Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives ris...
Main Author: | |
---|---|
Published: |
University College London (University of London)
2012
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-565647 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-5656472015-12-03T03:29:39ZProtocols to capture the functional plasticity of protein domain superfamiliesRentzsch, R.2012Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function.570University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647http://discovery.ucl.ac.uk/1348549/Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
570 |
spellingShingle |
570 Rentzsch, R. Protocols to capture the functional plasticity of protein domain superfamilies |
description |
Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function. |
author |
Rentzsch, R. |
author_facet |
Rentzsch, R. |
author_sort |
Rentzsch, R. |
title |
Protocols to capture the functional plasticity of protein domain superfamilies |
title_short |
Protocols to capture the functional plasticity of protein domain superfamilies |
title_full |
Protocols to capture the functional plasticity of protein domain superfamilies |
title_fullStr |
Protocols to capture the functional plasticity of protein domain superfamilies |
title_full_unstemmed |
Protocols to capture the functional plasticity of protein domain superfamilies |
title_sort |
protocols to capture the functional plasticity of protein domain superfamilies |
publisher |
University College London (University of London) |
publishDate |
2012 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647 |
work_keys_str_mv |
AT rentzschr protocolstocapturethefunctionalplasticityofproteindomainsuperfamilies |
_version_ |
1718141611774312448 |