English

Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance

Data Structures and Algorithms 2020-11-17 v2

Abstract

In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give an O~(nk+k2)\tilde{O}(\frac{n}{k}+k^2)-time greedy algorithm that distinguishes between length-nn input strings with edit distance at most kk and those with edit distance exceeding (3k+5)k(3k+5)k. This is an improvement and a simplification upon the result of Goldenberg, Krauthgamer, and Saha [FOCS 2019], where the kk vs Θ(k2)\Theta(k^2) gap edit distance problem is solved in O~(nk+k3)\tilde{O}(\frac{n}{k}+k^3) time. We further generalize our result to solve the kk vs kk' gap edit distance problem in time O~(nkk+k2+k2knk)\tilde{O}(\frac{nk}{k'}+k^2+ \frac{k^2}{k'}\sqrt{nk}), strictly improving upon the previously known bound O~(nkk+k3)\tilde{O}(\frac{nk}{k'}+k^3). Finally, we show that if the input strings do not have long highly periodic substrings, then already the kk vs (1+ϵ)k(1+\epsilon)k gap edit distance problem can be solved in sublinear time. Specifically, if the strings contain no substring of length \ell with period at most 2k2k, then the running time we achieve is O~(nϵ2k+k2)\tilde{O}(\frac{n}{\epsilon^2 k}+k^2\ell). We further give the first sublinear-time probabilistic embedding of edit distance to Hamming distance. For any parameter pp, our O~(np)\tilde{O}(\frac{n}{p})-time procedure yields an embedding with distortion O(kp)O(kp), where kk is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between kp+1p+1\frac{k-p+1}{p+1} and O(k2)O(k^2) with good probability. This generalizes the linear-time embedding of Chakraborty, Goldenberg, and Kouck\'y [STOC 2016], where the resultant Hamming distance is between k2\frac k2 and O(k2)O(k^2). Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.

Keywords

Cite

@article{arxiv.2007.12762,
  title  = {Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance},
  author = {Tomasz Kociumaka and Barna Saha},
  journal= {arXiv preprint arXiv:2007.12762},
  year   = {2020}
}