Summary: | 碩士 === 國立東華大學 === 資訊工程學系 === 98 === Gene teams (clusters) are groups of homologous genes which are colocated and con-
tiguous along two or multiple genomes. Recently, gene teams have become a popular
part of the field of comparative genomics to infer functional or evolutionary studies.
According to the distance between the homologous genes close to each other (called
δ value) and homologous type, the gene team finding problem can be classified into
four subproblems, namely, common intervals in permutations, common intervals in
sequences, max-gap clusters in permutations, and max-gap clusters in sequences.
Here, value is used to be a gap-based criterion of extensible homologous gene clusters
in the genomes. Besides, the homologous type determines what kind of genome models
is used (e.g., for orthologs, we consider the homologous genes as unique letters co-
occured in the different genomes and genomes are considered as permutations; for
paralogs, we consider the homologous genes as families which may occur more than
once in the genomes, and genomes are considered as sequences). Moreover, these
tasks are usually called common intervals finding problems when is equal to zero
(without gaps), and called max-gap clusters finding problems in the other way. Note
that such problems are regardless of homologous genes’order in the genomes.
In this thesis, we focus on the problem of max-gap clusters in permutations. We
propose a bucket method to implement a straightforward algorithm for this problem.
Though our algorithm has worst-case time complexity in O(n^2) where n is the number
of genes in a genome, it outperforms an O(nlog^2 n) algorithm in practice.
|