English

Efficiently Approximating Edit Distance Between Pseudorandom Strings

Data Structures and Algorithms 2018-11-13 v1

Abstract

We present an algorithm for approximating the edit distance ed(x,y)\operatorname{ed}(x, y) between two strings xx and yy in time parameterized by the degree to which one of the strings xx satisfies a natural pseudorandomness property. The pseudorandomness model is asymmetric in that no requirements are placed on the second string yy, which may be constructed by an adversary with full knowledge of xx. We say that xx is \emph{(p,B)(p, B)-pseudorandom} if all pairs aa and bb of disjoint BB-letter substrings of xx satisfy ed(a,b)pB\operatorname{ed}(a, b) \ge pB. Given parameters pp and BB, our algorithm computes the edit distance between a (p,B)(p, B)-pseudorandom string xx and an arbitrary string yy within a factor of O(1/p)O(1/p) in time O~(nB)\tilde{O}(nB), with high probability. Our algorithm is robust in the sense that it can handle a small portion of xx being adversarial (i.e., not satisfying the pseudorandomness property). In this case, the algorithm incurs an additive approximation error proportional to the fraction of xx which behaves maliciously. The asymmetry of our pseudorandomness model has particular appeal for the case where xx is a \emph{source string}, meaning that ed(x,y)\operatorname{ed}(x, y) will be computed for many strings yy. Suppose that one wishes to achieve an O(α)O(\alpha)-approximation for each ed(x,y)\operatorname{ed}(x, y) computation, and that BB is the smallest block-size for which the string xx is (1/α,B)(1/\alpha, B)-pseudorandom. We show that without knowing BB beforehand, xx may be preprocessed in time O~(n1.5B)\tilde{O}(n^{1.5}\sqrt{B}), so that all future computations of the form ed(x,y)\operatorname{ed}(x, y) may be O(α)O(\alpha)-approximated in time O~(nB)\tilde{O}(nB). Furthermore, for the special case where only a single ed(x,y)\operatorname{ed}(x, y) computation will be performed, we show how to achieve an O(α)O(\alpha)-approximation in time O~(n4/3B2/3)\tilde{O}(n^{4/3}B^{2/3}).

Keywords

Cite

@article{arxiv.1811.04300,
  title  = {Efficiently Approximating Edit Distance Between Pseudorandom Strings},
  author = {William Kuszmaul},
  journal= {arXiv preprint arXiv:1811.04300},
  year   = {2018}
}

Comments

SODA 2019

R2 v1 2026-06-23T05:11:33.219Z