High specificity automatic function assignment for enzyme sequences

The number of protein sequences being deposited in databases is currently growing rapidly as a result of large-scale high throughput genome sequencing efforts. A large proportion of these sequences have no experimentally determined structure. Also, relatively few have high quality, specific, experim...

Full description

Bibliographic Details
Main Author:	Roden, D. L.
Published:	University College London (University of London) 2011
Subjects:	004
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565381

id	ndltd-bl.uk-oai-ethos.bl.uk-565381
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-5653812015-12-03T03:30:31ZHigh specificity automatic function assignment for enzyme sequencesRoden, D. L.2011The number of protein sequences being deposited in databases is currently growing rapidly as a result of large-scale high throughput genome sequencing efforts. A large proportion of these sequences have no experimentally determined structure. Also, relatively few have high quality, specific, experimentally determined functions. Due to the time, cost and technical complexity of experimental procedures for the determination of protein function this situation is unlikely to change in the near future. Therefore, one of the major challenges for bioinformatics is the ability to automatically assign highly accurate, high-specificity functional information to these unknown protein sequences. As yet this problem has not been successfully solved to a level both acceptable in terms of detailed accuracy and reliability for use as a basis for detailed biological analysis on a genome wide, automated, high-throughput scale. This research thesis aims to address this shortfall through the provision and benchmarking of methods that can be used towards improving the accuracy of high-specificity protein function prediction from enzyme sequences. The datasets used in these studies are multiple alignments of evolutionarily related protein sequences, identified through the use of BLAST sequence database searches. Firstly, a number of non-standard amino acid substitution matrices were used to re-score the benchmark multiple sequence alignments. A subset of these matrices were shown to improve the accuracy of specific function annotation, when compared to both the original BLAST sequence similarity ordering and a random sequence selection model. Following this, two established methods for the identification of functional specificity determining amino acid residues (fSDRs) were used to identify regions within the aligned sequences that are functionally and phylogenetically informative. These localised sequence regions were then used to re-score the aligned sequences and provide an assessment of their ability to improve the specific functional annotation of the benchmark sequence sets. Finally, a machine learning approach (support vector machines) was followed to evaluate the possibility of identifying fSDRs, which improve the annotation accuracy, directly from alignments of closely related protein sequences without prior knowledge of their specific functional sub-types. The performance of this SVM based method was then assessed by applying it to the automatic functional assignment of a number of well studied classes of enzymes.004University College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565381http://discovery.ucl.ac.uk/1321566/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	004
spellingShingle	004 Roden, D. L. High specificity automatic function assignment for enzyme sequences
description	The number of protein sequences being deposited in databases is currently growing rapidly as a result of large-scale high throughput genome sequencing efforts. A large proportion of these sequences have no experimentally determined structure. Also, relatively few have high quality, specific, experimentally determined functions. Due to the time, cost and technical complexity of experimental procedures for the determination of protein function this situation is unlikely to change in the near future. Therefore, one of the major challenges for bioinformatics is the ability to automatically assign highly accurate, high-specificity functional information to these unknown protein sequences. As yet this problem has not been successfully solved to a level both acceptable in terms of detailed accuracy and reliability for use as a basis for detailed biological analysis on a genome wide, automated, high-throughput scale. This research thesis aims to address this shortfall through the provision and benchmarking of methods that can be used towards improving the accuracy of high-specificity protein function prediction from enzyme sequences. The datasets used in these studies are multiple alignments of evolutionarily related protein sequences, identified through the use of BLAST sequence database searches. Firstly, a number of non-standard amino acid substitution matrices were used to re-score the benchmark multiple sequence alignments. A subset of these matrices were shown to improve the accuracy of specific function annotation, when compared to both the original BLAST sequence similarity ordering and a random sequence selection model. Following this, two established methods for the identification of functional specificity determining amino acid residues (fSDRs) were used to identify regions within the aligned sequences that are functionally and phylogenetically informative. These localised sequence regions were then used to re-score the aligned sequences and provide an assessment of their ability to improve the specific functional annotation of the benchmark sequence sets. Finally, a machine learning approach (support vector machines) was followed to evaluate the possibility of identifying fSDRs, which improve the annotation accuracy, directly from alignments of closely related protein sequences without prior knowledge of their specific functional sub-types. The performance of this SVM based method was then assessed by applying it to the automatic functional assignment of a number of well studied classes of enzymes.
author	Roden, D. L.
author_facet	Roden, D. L.
author_sort	Roden, D. L.
title	High specificity automatic function assignment for enzyme sequences
title_short	High specificity automatic function assignment for enzyme sequences
title_full	High specificity automatic function assignment for enzyme sequences
title_fullStr	High specificity automatic function assignment for enzyme sequences
title_full_unstemmed	High specificity automatic function assignment for enzyme sequences
title_sort	high specificity automatic function assignment for enzyme sequences
publisher	University College London (University of London)
publishDate	2011
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.565381
work_keys_str_mv	AT rodendl highspecificityautomaticfunctionassignmentforenzymesequences
_version_	1718141587383386112

High specificity automatic function assignment for enzyme sequences

Similar Items