Efficient Algorithms for the Constrained Longest Common Subsequence Problems

博士 === 國立臺灣大學 === 資訊工程學研究所 === 98 === This dissertation studies several variants of the longest common subsequence (abbreviated LCS) problem. These variants arise from some applications and theoretical interests in molecular biology and sequence comparison. In the First part of this dissertation, w...

Full description

Bibliographic Details
Main Authors: Yi-Ching Chen, 陳怡靜
Other Authors: 趙坤茂
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/93278734584938381425
Description
Summary:博士 === 國立臺灣大學 === 資訊工程學研究所 === 98 === This dissertation studies several variants of the longest common subsequence (abbreviated LCS) problem. These variants arise from some applications and theoretical interests in molecular biology and sequence comparison. In the First part of this dissertation, we study four constrained LCS (abbreviated CLCS) problems, each of which is to find a longest sequence that is a common subsequence of two sequences and either includes or excludes a constrained pattern as a subsequence or substring. We investigate the optimality principles of these problems and then derive a dynamic programming algorithm for each problem. The theoretical analyses show that the time complexity of each algorithm is proportional to the product of the lengths of the given sequences and constrained pattern. We also consider the case where the number of constrained patterns in each problem is arbitrary. To make the similarity measurement of sequences more flexible, in the second part of this dissertation, we study the problem of finding a longest sequence that is a common subsequence of two sequences and not merely includes a constrained pattern as a subsequence but excludes the other constrained pattern as a subsequence. We give a dynamic programming algorithm whose time complexity is proportional to the product of the lengths of the given sequences and constrained patterns. We also present a fast algorithm which restricts the computation on the positions of matches between the sequences. In the last part of this dissertation, we consider a common used data compression scheme called run-length encoding (abbreviated RLE) on the input sequences of the LCS problem and one of the CLCS problems. To solve the LCS problem of two RLE sequences, we investigate the properties of the partition, induced by the runs of two sequences, in the dynamic programming matrix for the LCS problem and exploit the sequences for computing the length of an LCS by utilizing the simplicity of some positions. Finally, we devise two algorithms for the problem of finding a longest sequence that is a common subsequence of two RLE sequences and includes a constrained RLE pattern as a subsequence.