Summary: | 碩士 === 國立中山大學 === 資訊工程學系研究所 === 89 === The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. Due to its importance, many algorithms have been proposed. With dynamic programming, finding the optimal alignment for a pair of sequences can be done in O(n2) time, where n is the length of the two strings. Unfortunately, for the general optimization problem of aligning k sequences of length n , O(nk) time is required.
In this thesis, we shall first propose an efficient group alignment method to perform the alignment between two groups of sequences. Then we shall propose a clustering method to build the tree topology for merging. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. By our experiments, both the alignment quality and required time of our algorithm are better than those of NJ (neighbor joining) algorithm and Clustal W algorithm.
|