Summary: | The function of an enzyme is often dependent on a few key functional residues and the principal objective of this project was to develop a novel function prediction system which takes advantage of this by comparing the conserved amino acids in known enzyme families to those in a putative enzyme. Multiple sequence alignments of well characterised enzyme families (with an E.C. number assigned) are used to create unordered sets of conserved functional residues, termed <i>Treads</i>. Comparison of a query proteins <i>Tread </i> to the reference <i>Treads</i> is undertaken by projecting them in multidimensional space and measuring distance between them. A major advantage of this prediction strategy implemented in DAROGAN is that it should be able to recognise similarities in the functions of enzymes that are not similar in structure or sequence. The method has been tested with regard to its ability to predict cofactor-dependencies toward pyridoxal-5’-phosphate, thiamine, glutathione and folic acid utilising enzymes. An area of application for DAROGAN is the prediction of previously described enzyme functions in organisms with completed genomes to which no gene and protein sequence could be assigned though the standard annotation processes. Investigations were made into the potential of utilising the DAROGAN method to propose candidates for the missing pyridoxal-5’-phosphate utilising enzymes in the <i>E. coli</i> genome according to EcoCyc. Candidates are proposed by assessing the 511 sequences from the GeneQuiz project, to which there are homologues in other species, but with uncertain functions. The assessment takes the form of using the DAROGAN method to determine the similarities of each of the sequences to the reference <i>Treads.</i>
|