High specificity automatic function assignment for enzyme sequences

The number of protein sequences being deposited in databases is currently growing rapidly as a result of large-scale high throughput genome sequencing efforts. A large proportion of these sequences have no experimentally determined structure. Also, relatively few have high quality, specific, experim...

Full description

Bibliographic Details
Main Author: Roden, D. L.
Published: University College London (University of London) 2011
Subjects:
004
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565381
id ndltd-bl.uk-oai-ethos.bl.uk-565381
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5653812015-12-03T03:30:31ZHigh specificity automatic function assignment for enzyme sequencesRoden, D. L.2011The number of protein sequences being deposited in databases is currently growing rapidly as a result of large-scale high throughput genome sequencing efforts. A large proportion of these sequences have no experimentally determined structure. Also, relatively few have high quality, specific, experimentally determined functions. Due to the time, cost and technical complexity of experimental procedures for the determination of protein function this situation is unlikely to change in the near future. Therefore, one of the major challenges for bioinformatics is the ability to automatically assign highly accurate, high-specificity functional information to these unknown protein sequences. As yet this problem has not been successfully solved to a level both acceptable in terms of detailed accuracy and reliability for use as a basis for detailed biological analysis on a genome wide, automated, high-throughput scale. This research thesis aims to address this shortfall through the provision and benchmarking of methods that can be used towards improving the accuracy of high-specificity protein function prediction from enzyme sequences. The datasets used in these studies are multiple alignments of evolutionarily related protein sequences, identified through the use of BLAST sequence database searches. Firstly, a number of non-standard amino acid substitution matrices were used to re-score the benchmark multiple sequence alignments. A subset of these matrices were shown to improve the accuracy of specific function annotation, when compared to both the original BLAST sequence similarity ordering and a random sequence selection model. Following this, two established methods for the identification of functional specificity determining amino acid residues (fSDRs) were used to identify regions within the aligned sequences that are functionally and phylogenetically informative. These localised sequence regions were then used to re-score the aligned sequences and provide an assessment of their ability to improve the specific functional annotation of the benchmark sequence sets. Finally, a machine learning approach (support vector machines) was followed to evaluate the possibility of identifying fSDRs, which improve the annotation accuracy, directly from alignments of closely related protein sequences without prior knowledge of their specific functional sub-types. The performance of this SVM based method was then assessed by applying it to the automatic functional assignment of a number of well studied classes of enzymes.004University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565381http://discovery.ucl.ac.uk/1321566/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 004
spellingShingle 004
Roden, D. L.
High specificity automatic function assignment for enzyme sequences
description The number of protein sequences being deposited in databases is currently growing rapidly as a result of large-scale high throughput genome sequencing efforts. A large proportion of these sequences have no experimentally determined structure. Also, relatively few have high quality, specific, experimentally determined functions. Due to the time, cost and technical complexity of experimental procedures for the determination of protein function this situation is unlikely to change in the near future. Therefore, one of the major challenges for bioinformatics is the ability to automatically assign highly accurate, high-specificity functional information to these unknown protein sequences. As yet this problem has not been successfully solved to a level both acceptable in terms of detailed accuracy and reliability for use as a basis for detailed biological analysis on a genome wide, automated, high-throughput scale. This research thesis aims to address this shortfall through the provision and benchmarking of methods that can be used towards improving the accuracy of high-specificity protein function prediction from enzyme sequences. The datasets used in these studies are multiple alignments of evolutionarily related protein sequences, identified through the use of BLAST sequence database searches. Firstly, a number of non-standard amino acid substitution matrices were used to re-score the benchmark multiple sequence alignments. A subset of these matrices were shown to improve the accuracy of specific function annotation, when compared to both the original BLAST sequence similarity ordering and a random sequence selection model. Following this, two established methods for the identification of functional specificity determining amino acid residues (fSDRs) were used to identify regions within the aligned sequences that are functionally and phylogenetically informative. These localised sequence regions were then used to re-score the aligned sequences and provide an assessment of their ability to improve the specific functional annotation of the benchmark sequence sets. Finally, a machine learning approach (support vector machines) was followed to evaluate the possibility of identifying fSDRs, which improve the annotation accuracy, directly from alignments of closely related protein sequences without prior knowledge of their specific functional sub-types. The performance of this SVM based method was then assessed by applying it to the automatic functional assignment of a number of well studied classes of enzymes.
author Roden, D. L.
author_facet Roden, D. L.
author_sort Roden, D. L.
title High specificity automatic function assignment for enzyme sequences
title_short High specificity automatic function assignment for enzyme sequences
title_full High specificity automatic function assignment for enzyme sequences
title_fullStr High specificity automatic function assignment for enzyme sequences
title_full_unstemmed High specificity automatic function assignment for enzyme sequences
title_sort high specificity automatic function assignment for enzyme sequences
publisher University College London (University of London)
publishDate 2011
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565381
work_keys_str_mv AT rodendl highspecificityautomaticfunctionassignmentforenzymesequences
_version_ 1718141587383386112