Computing probabilities for common substrings in random strings
The Common Substring in Random Strings (CSRS) problem is defined as follows: what is the probability that a set of r random strings of length n generated by a random process P contain a common substring of length k? In this thesis, we investigate the CSRS problem and introduce two new methods for co...
Main Author: | |
---|---|
Format: | Others |
Language: | en |
Published: |
McGill University
2006
|
Subjects: | |
Online Access: | http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=99324 |
Summary: | The Common Substring in Random Strings (CSRS) problem is defined as follows: what is the probability that a set of r random strings of length n generated by a random process P contain a common substring of length k? In this thesis, we investigate the CSRS problem and introduce two new methods for computing approximate solutions to the CSRS problem in the cases where the random strings are generated by a Bernoulli or Markov process. We also present generalizations to the methods to compute the probability of finding a common substring among only q of the r random strings, and to allow mismatches in the substring occurrences. We show through simulation experiments that the two new methods introduced in this thesis provide a substantial improvement in accuracy over previous methods when there are r > 2 random strings in the problem instance. |
---|