Protocols to capture the functional plasticity of protein domain superfamilies

Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives ris...

Full description

Bibliographic Details
Main Author: Rentzsch, R.
Published: University College London (University of London) 2012
Subjects:
570
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647
id ndltd-bl.uk-oai-ethos.bl.uk-565647
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5656472015-12-03T03:29:39ZProtocols to capture the functional plasticity of protein domain superfamiliesRentzsch, R.2012Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function.570University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647http://discovery.ucl.ac.uk/1348549/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 570
spellingShingle 570
Rentzsch, R.
Protocols to capture the functional plasticity of protein domain superfamilies
description Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function.
author Rentzsch, R.
author_facet Rentzsch, R.
author_sort Rentzsch, R.
title Protocols to capture the functional plasticity of protein domain superfamilies
title_short Protocols to capture the functional plasticity of protein domain superfamilies
title_full Protocols to capture the functional plasticity of protein domain superfamilies
title_fullStr Protocols to capture the functional plasticity of protein domain superfamilies
title_full_unstemmed Protocols to capture the functional plasticity of protein domain superfamilies
title_sort protocols to capture the functional plasticity of protein domain superfamilies
publisher University College London (University of London)
publishDate 2012
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647
work_keys_str_mv AT rentzschr protocolstocapturethefunctionalplasticityofproteindomainsuperfamilies
_version_ 1718141611774312448