The Computational Hardness of Estimating Edit Distance

We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably...

Full description

Bibliographic Details
Main Authors: Andoni, Alexandr (Contributor), Krauthgamer, Robert (Author)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor)
Format: Article
Language:English
Published: Society for Industrial and Applied Mathematics, 2010-09-01T20:29:39Z.
Subjects:
Online Access:Get fulltext
LEADER 02177 am a22002053u 4500
001 58102
042 |a dc 
100 1 0 |a Andoni, Alexandr  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
100 1 0 |a Andoni, Alexandr  |e contributor 
100 1 0 |a Andoni, Alexandr  |e contributor 
700 1 0 |a Krauthgamer, Robert  |e author 
245 0 0 |a The Computational Hardness of Estimating Edit Distance 
246 3 3 |a THE COMPUTATIONAL HARDNESS OF ESTIMATING EDIT DISTANCE 
260 |b Society for Industrial and Applied Mathematics,   |c 2010-09-01T20:29:39Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/58102 
520 |a We prove the first nontrivial communication complexity lower bound for the problem of estimating the edit distance (aka Levenshtein distance) between two strings. To the best of our knowledge, this is the first computational setting in which the complexity of estimating the edit distance is provably larger than that of Hamming distance. Our lower bound exhibits a trade-off between approximation and communication, asserting, for example, that protocols with $O(1)$ bits of communication can obtain only approximation $\alpha\geq\Omega(\log d/\log\log d)$, where $d$ is the length of the input strings. This case of $O(1)$ communication is of particular importance since it captures constant-size sketches as well as embeddings into spaces like $l_1$ and squared-$l_2$, two prevailing algorithmic approaches for dealing with edit distance. Indeed, the known nontrivial communication upper bounds are all derived from embeddings into $l_1$. By excluding low-communication protocols for edit distance, we rule out a strictly richer class of algorithms than previous results. Furthermore, our lower bound holds not only for strings over a binary alphabet but also for strings that are permutations (aka the Ulam metric). For this case, our bound nearly matches an upper bound known via embedding the Ulam metric into $l_1$. Our proof uses a new technique that relies on Fourier analysis in a rather elementary way. 
546 |a en_US 
655 7 |a Article 
773 |t SIAM Journal of Computing