Automated methods for the determination of homologous relationships and functional similarities between protein domains
CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous rela...
Main Author: | |
---|---|
Published: |
University College London (University of London)
2007
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500035 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-500035 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-5000352016-08-04T03:29:31ZAutomated methods for the determination of homologous relationships and functional similarities between protein domainsRedfern, Oliver Charles2007CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous relationships between these domains using the structural data from the Protein Data Bank (PDB). The aim of this work was to aid the manual classification of domains into the database and provide putative functional assignments to structures solved by the structural genomics initiatives. A fast and novel algorithm, CATHEDRAL, was developed to make fold assignments to regions of polypeptide chains. By combining a fast secondary-structure method (GRATH) and a slower residue-based method (SSAP), the algorithm was able to accurately assign boundaries for distant relatives, undetectable by sequence methods. Sequence and structural conservation patterns were combined in a novel algorithm, FLORA, to develop structural templates specific to catalytic function. FLORA was able to predict the correct functional site in 80% of cases and combined with global structure comparison, it was able to assign domains to enzyme families within diverse superfamilies. Techniques in structure comparison were also applied to ab initio models of protein domains, in order to assign them to fold groups within the CATH database. A novel scoring method was developed to pre-select models that were more likely to have adopted the correct fold. A selected sample of models for each target structure was then compared against representatives from the CATH database using the MAMMOTH and SSAP algorithms. Data from these alignments were combined using a Support Vector Machine to assign the target to a fold group within CATH. This work was generously supported by the Engineering and Physical Sciences Research Council.572.6University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500035http://discovery.ucl.ac.uk/1446055/Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
572.6 |
spellingShingle |
572.6 Redfern, Oliver Charles Automated methods for the determination of homologous relationships and functional similarities between protein domains |
description |
CATH is a protein database of structural domains which are assigned to superfamilies through evidence of a common evolutionary ancestor. These superfamilies are further grouped by overall structural similarity into folds. This thesis explores several automated methods for recognising homologous relationships between these domains using the structural data from the Protein Data Bank (PDB). The aim of this work was to aid the manual classification of domains into the database and provide putative functional assignments to structures solved by the structural genomics initiatives. A fast and novel algorithm, CATHEDRAL, was developed to make fold assignments to regions of polypeptide chains. By combining a fast secondary-structure method (GRATH) and a slower residue-based method (SSAP), the algorithm was able to accurately assign boundaries for distant relatives, undetectable by sequence methods. Sequence and structural conservation patterns were combined in a novel algorithm, FLORA, to develop structural templates specific to catalytic function. FLORA was able to predict the correct functional site in 80% of cases and combined with global structure comparison, it was able to assign domains to enzyme families within diverse superfamilies. Techniques in structure comparison were also applied to ab initio models of protein domains, in order to assign them to fold groups within the CATH database. A novel scoring method was developed to pre-select models that were more likely to have adopted the correct fold. A selected sample of models for each target structure was then compared against representatives from the CATH database using the MAMMOTH and SSAP algorithms. Data from these alignments were combined using a Support Vector Machine to assign the target to a fold group within CATH. This work was generously supported by the Engineering and Physical Sciences Research Council. |
author |
Redfern, Oliver Charles |
author_facet |
Redfern, Oliver Charles |
author_sort |
Redfern, Oliver Charles |
title |
Automated methods for the determination of homologous relationships and functional similarities between protein domains |
title_short |
Automated methods for the determination of homologous relationships and functional similarities between protein domains |
title_full |
Automated methods for the determination of homologous relationships and functional similarities between protein domains |
title_fullStr |
Automated methods for the determination of homologous relationships and functional similarities between protein domains |
title_full_unstemmed |
Automated methods for the determination of homologous relationships and functional similarities between protein domains |
title_sort |
automated methods for the determination of homologous relationships and functional similarities between protein domains |
publisher |
University College London (University of London) |
publishDate |
2007 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500035 |
work_keys_str_mv |
AT redfernolivercharles automatedmethodsforthedeterminationofhomologousrelationshipsandfunctionalsimilaritiesbetweenproteindomains |
_version_ |
1718369584052961280 |