Protocols to capture the functional plasticity of protein domain superfamilies

Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives ris...

Full description

Bibliographic Details
Main Author:	Rentzsch, R.
Published:	University College London (University of London) 2012
Subjects:	570
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647

id	ndltd-bl.uk-oai-ethos.bl.uk-565647
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-5656472015-12-03T03:29:39ZProtocols to capture the functional plasticity of protein domain superfamiliesRentzsch, R.2012Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function.570University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647http://discovery.ucl.ac.uk/1348549/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	570
spellingShingle	570 Rentzsch, R. Protocols to capture the functional plasticity of protein domain superfamilies
description	Most proteins comprise several domains, segments that are clearly discernable in protein structure and sequence. Over the last two decades, it has become increasingly clear that domains are often also functional modules that can be duplicated and recombined in the course of evolution. This gives rise to novel protein functions. Traditionally, protein domains are grouped into homologous domain superfamilies in resources such as SCOP and CATH. This is done primarily on the basis of similarities in their three-dimensional structures. A biologically sound subdivision of the domain superfamilies into families of sequences with conserved function has so far been missing. Such families form the ideal framework to study the evolutionary and functional plasticity of individual superfamilies. In the few existing resources that aim to classify domain families, a considerable amount of manual curation is involved. Whilst immensely valuable, the latter is inherently slow and expensive. It can thus impede large-scale application. This work describes the development and application of a fully-automatic pipeline for identifying functional families within superfamilies of protein domains. This pipeline is built around a method for clustering large-scale sequence datasets in distributed computing environments. In addition, it implements two different protocols for identifying families on the basis of the clustering results: a supervised and an unsupervised protocol. These are used depending on whether or not high-quality protein function annotation data are associated with a given superfamily. The results attained for more than 1,500 domain superfamilies are discussed in both a qualitative and quantitative manner. The use of domain sequence data in conjunction with Gene Ontology protein function annotations and a set of rules and concepts to derive families is a novel approach to large-scale domain sequence classification. Importantly, the focus lies on domain, not whole-protein function.
author	Rentzsch, R.
author_facet	Rentzsch, R.
author_sort	Rentzsch, R.
title	Protocols to capture the functional plasticity of protein domain superfamilies
title_short	Protocols to capture the functional plasticity of protein domain superfamilies
title_full	Protocols to capture the functional plasticity of protein domain superfamilies
title_fullStr	Protocols to capture the functional plasticity of protein domain superfamilies
title_full_unstemmed	Protocols to capture the functional plasticity of protein domain superfamilies
title_sort	protocols to capture the functional plasticity of protein domain superfamilies
publisher	University College London (University of London)
publishDate	2012
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565647
work_keys_str_mv	AT rentzschr protocolstocapturethefunctionalplasticityofproteindomainsuperfamilies
_version_	1718141611774312448

Protocols to capture the functional plasticity of protein domain superfamilies

Similar Items