Computing probabilities for common substrings in random strings

The Common Substring in Random Strings (CSRS) problem is defined as follows: what is the probability that a set of r random strings of length n generated by a random process P contain a common substring of length k? In this thesis, we investigate the CSRS problem and introduce two new methods for co...

Full description

Bibliographic Details
Main Author: Blais, Éric.
Format: Others
Language:en
Published: McGill University 2006
Subjects:
Online Access:http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=99324
Description
Summary:The Common Substring in Random Strings (CSRS) problem is defined as follows: what is the probability that a set of r random strings of length n generated by a random process P contain a common substring of length k? In this thesis, we investigate the CSRS problem and introduce two new methods for computing approximate solutions to the CSRS problem in the cases where the random strings are generated by a Bernoulli or Markov process. We also present generalizations to the methods to compute the probability of finding a common substring among only q of the r random strings, and to allow mismatches in the substring occurrences. We show through simulation experiments that the two new methods introduced in this thesis provide a substantial improvement in accuracy over previous methods when there are r > 2 random strings in the problem instance.