Automated methods for the determination of homologous relationships and functional similarities between protein domains

CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous rela...

Full description

Bibliographic Details
Main Author: Redfern, Oliver Charles
Published: University College London (University of London) 2007
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500035
id ndltd-bl.uk-oai-ethos.bl.uk-500035
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5000352016-08-04T03:29:31ZAutomated methods for the determination of homologous relationships and functional similarities between protein domainsRedfern, Oliver Charles2007CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous relationships between these domains using the structural data from the Protein Data Bank (PDB). The aim of this work was to aid the manual classification of domains into the database and provide putative functional assignments to structures solved by the structural genomics initiatives. A fast and novel algorithm, CATHEDRAL, was developed to make fold assignments to regions of polypeptide chains. By combining a fast secondary-structure method (GRATH) and a slower residue-based method (SSAP), the algorithm was able to accurately assign boundaries for distant relatives, undetectable by sequence methods. Sequence and structural conservation patterns were combined in a novel algorithm, FLORA, to develop structural templates specific to catalytic function. FLORA was able to predict the correct functional site in 80% of cases and combined with global structure comparison, it was able to assign domains to enzyme families within diverse superfamilies. Techniques in structure comparison were also applied to ab initio models of protein domains, in order to assign them to fold groups within the CATH database. A novel scoring method was developed to pre-select models that were more likely to have adopted the correct fold. A selected sample of models for each target structure was then compared against representatives from the CATH database using the MAMMOTH and SSAP algorithms. Data from these alignments were combined using a Support Vector Machine to assign the target to a fold group within CATH. This work was generously supported by the Engineering and Physical Sciences Research Council.572.6University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500035http://discovery.ucl.ac.uk/1446055/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 572.6
spellingShingle 572.6
Redfern, Oliver Charles
Automated methods for the determination of homologous relationships and functional similarities between protein domains
description CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous relationships between these domains using the structural data from the Protein Data Bank (PDB). The aim of this work was to aid the manual classification of domains into the database and provide putative functional assignments to structures solved by the structural genomics initiatives. A fast and novel algorithm, CATHEDRAL, was developed to make fold assignments to regions of polypeptide chains. By combining a fast secondary-structure method (GRATH) and a slower residue-based method (SSAP), the algorithm was able to accurately assign boundaries for distant relatives, undetectable by sequence methods. Sequence and structural conservation patterns were combined in a novel algorithm, FLORA, to develop structural templates specific to catalytic function. FLORA was able to predict the correct functional site in 80% of cases and combined with global structure comparison, it was able to assign domains to enzyme families within diverse superfamilies. Techniques in structure comparison were also applied to ab initio models of protein domains, in order to assign them to fold groups within the CATH database. A novel scoring method was developed to pre-select models that were more likely to have adopted the correct fold. A selected sample of models for each target structure was then compared against representatives from the CATH database using the MAMMOTH and SSAP algorithms. Data from these alignments were combined using a Support Vector Machine to assign the target to a fold group within CATH. This work was generously supported by the Engineering and Physical Sciences Research Council.
author Redfern, Oliver Charles
author_facet Redfern, Oliver Charles
author_sort Redfern, Oliver Charles
title Automated methods for the determination of homologous relationships and functional similarities between protein domains
title_short Automated methods for the determination of homologous relationships and functional similarities between protein domains
title_full Automated methods for the determination of homologous relationships and functional similarities between protein domains
title_fullStr Automated methods for the determination of homologous relationships and functional similarities between protein domains
title_full_unstemmed Automated methods for the determination of homologous relationships and functional similarities between protein domains
title_sort automated methods for the determination of homologous relationships and functional similarities between protein domains
publisher University College London (University of London)
publishDate 2007
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500035
work_keys_str_mv AT redfernolivercharles automatedmethodsforthedeterminationofhomologousrelationshipsandfunctionalsimilaritiesbetweenproteindomains
_version_ 1718369584052961280