Theory of Computing
-------------------
Title : Embedding the Ulam metric into ℓ1
Authors : Moses Charikar and Robert Krauthgamer
Volume : 2
Number : 11
Pages : 207-224
URL : http://www.theoryofcomputing.org/articles/v002a011
Abstract
--------
Edit distance is a fundamental measure of distance between strings,
the extensive study of which has recently focused on computational
problems such as nearest neighbor search, sketching and fast approximation.
A very powerful paradigm is to map the metric space induced by the
edit distance into a normed space (e.g., \ell_1) with small distortion
and then use the rich algorithmic toolkit known for normed spaces.
Although the minimum distortion required to embed edit distance into
\ell_1 has received a lot of attention lately, there is a large gap
between known upper and lower bounds. We make progress on this
question by considering large, well-structured submetrics of the
edit distance metric space.
Our main technical result is that the Ulam metric, namely, the
edit distance on permutations of length at most n, embeds into
\ell_1 with distortion O(log n). This immediately leads to
sketching algorithms with constant size sketches, and to efficient
approximate nearest neighbor search algorithms, with approximation
factor O(log n). The embedding and its algorithmic consequences
present a big improvement over those previously known for the Ulam
metric, and they are significantly better than the state of the art
for edit distance in general. Further, we extend these results for
the Ulam metric to edit distance on strings that are (locally)
non-repetitive, i.e., strings where (close by) substrings are distinct.