An Experimental Study of Gene Team Problem

碩士 === 國立東華大學 === 資訊工程學系 === 98 === Gene teams (clusters) are groups of homologous genes which are colocated and con- tiguous along two or multiple genomes. Recently, gene teams have become a popular part of the field of comparative genomics to infer functional or evolutionary studies. According to...

Full description

Bibliographic Details
Main Authors: Wei-Hsin Wang, 王偉信
Other Authors: Sheng-Lung Peng
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/01877929717965752549
Description
Summary:碩士 === 國立東華大學 === 資訊工程學系 === 98 === Gene teams (clusters) are groups of homologous genes which are colocated and con- tiguous along two or multiple genomes. Recently, gene teams have become a popular part of the field of comparative genomics to infer functional or evolutionary studies. According to the distance between the homologous genes close to each other (called δ value) and homologous type, the gene team finding problem can be classified into four subproblems, namely, common intervals in permutations, common intervals in sequences, max-gap clusters in permutations, and max-gap clusters in sequences. Here, value is used to be a gap-based criterion of extensible homologous gene clusters in the genomes. Besides, the homologous type determines what kind of genome models is used (e.g., for orthologs, we consider the homologous genes as unique letters co- occured in the different genomes and genomes are considered as permutations; for paralogs, we consider the homologous genes as families which may occur more than once in the genomes, and genomes are considered as sequences). Moreover, these tasks are usually called common intervals finding problems when is equal to zero (without gaps), and called max-gap clusters finding problems in the other way. Note that such problems are regardless of homologous genes’order in the genomes. In this thesis, we focus on the problem of max-gap clusters in permutations. We propose a bucket method to implement a straightforward algorithm for this problem. Though our algorithm has worst-case time complexity in O(n^2) where n is the number of genes in a genome, it outperforms an O(nlog^2 n) algorithm in practice.