Computational identification of missing enzymatic gene based on conservation profile of correlated gene clusters

碩士 === 國立陽明大學 === 衛生資訊與決策研究所 === 92 === Most of genome annotations primarily come from biochemical experimentation. But with the availability of increasing numbers of fully sequenced and annotated genomes, uncharacterized genes can be automatically assigned functions by sequence similarity search f...

Full description

Bibliographic Details
Main Authors: Hon-Wei Chen, 陳虹瑋
Other Authors: Der-Ming Liou
Format: Others
Language:en_US
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/12340195954866313648
Description
Summary:碩士 === 國立陽明大學 === 衛生資訊與決策研究所 === 92 === Most of genome annotations primarily come from biochemical experimentation. But with the availability of increasing numbers of fully sequenced and annotated genomes, uncharacterized genes can be automatically assigned functions by sequence similarity search for well-characterized genes in other organisms. Although the sequence similarity-based tools [such as BLAST, FASTA] have most successes in functional assignment of genes, they fail to identify functions for many genes or even result in annotation errors. The genes which participate in metabolism but have no functions or incorrect functions bring the problem of missing enzymes in metabolic pathways. As a result of missing enzymes, the results from pathway reconstruction for a specific organism often lack of precision and provide incomplete pathway knowledge for biologists. So that the follow-up applications like pathway comparison in different organisms or pathway simulation will have progressive errors. The problem traditionally requires advanced experiments to be solved but therefore produce a large number of experimental efforts and time costs. In order to solve the problem of missing enzymes in metabolism, we propose a computational approach to identify conservation profiles of gene clusters which both have similar chromosomal arrangements and are functionally-coupled in metabolic pathways shared among multiple organisms. Metabolic pathway diagrams and annotations of 157 prokaryotic genomes are obtained from KEGG (Kyoto Encyclopedia of Genes and Genomes). Enzymatic genes involved in a pathway shared by selected organisms and located at neighboring positions in chromosome are grouped together as a gene cluster. With the possibility of genome rearrangement, the gene order in each correlated gene cluster is allowed to be different for different organisms. The genes in each correlated gene cluster completely present in all organisms. A conservation profile of the gene clusters shared among multiple organisms is automatically recognized and graphically presented on both chromosome and pathway maps. The profile can be used to investigate evolutionary genome dynamics and improve the quality of genome annotation by identifying missing enzymatic genes. In this thesis, we present one case example how to identify missing enzymatic genes through our computational approach. The results will perform two missing enzymatic genes we discovered in glycolysis pathway for bacillus cereus ATCC 14579. Finally, the results can fill the incomplete annotation of this organism.