Algorithms for Comparing Run-Length Encoded Strings without Decoding

博士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === A recent trend in stringology combines the two concepts of pattern matching and data compression, creating the so-called compressed pattern matching problem. The ultimate goal in this line of investigation is to design algorithms that can cope with encoded strin...

Full description

Bibliographic Details
Main Authors: Kuan-Yu Chen, 陳冠宇
Other Authors: 趙坤茂
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/98024825139801717013
Description
Summary:博士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === A recent trend in stringology combines the two concepts of pattern matching and data compression, creating the so-called compressed pattern matching problem. The ultimate goal in this line of investigation is to design algorithms that can cope with encoded strings without resorting to any decoding step. The underlying compression scheme considered in this dissertation is called run-length encoding. Despite its simple coding nature, the only positive result before this work is the computation of indel-distance (dual of longest common subsequence) of two run-length encoded strings, achieving O(mnlog mn) time, where m and n are the number of runs of the input strings. Both comparing and identifying featured patterns in run-length encoded strings are explored in this dissertation. All the presented algorithms contain no decoding step, one of which is the firs-known algorithm that computes the Levenshtein distance of two run-length encoded strings without decoding. Several lower bounds are established in the dissertation, by reduction from either the 3sum problem or the problem of sorting pairwise-sums, implying O­mega(mn) and Omega­(mnlogm) conjectured time bounds, respectively. We believe that the work accomplished in this dissertation shed some light on solutions to aligning two run-length encoded strings without decoding.