Faster Pattern Matching under Edit Distance
Abstract
We consider the approximate pattern matching problem under the edit distance. Given a text of length , a pattern of length , and a threshold , the task is to find the starting positions of all substrings of that can be transformed to with at most edits. More than 20 years ago, Cole and Hariharan [SODA'98, J. Comput.'02] gave an -time algorithm for this classic problem, and this runtime has not been improved since. Here, we present an algorithm that runs in time , thus breaking through this long-standing barrier. In the case where for some arbitrarily small positive constant , our algorithm improves over the state-of-the-art by polynomial factors: it is polynomially faster than both the algorithm of Cole and Hariharan and the classic -time algorithm of Landau and Vishkin [STOC'86, J. Algorithms'89]. We observe that the bottleneck case of the alternative -time algorithm of Charalampopoulos, Kociumaka, and Wellnitz [FOCS'20] is when the text and the pattern are (almost) periodic. Our new algorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching), which we solve by building on tools developed by Tiskin [SODA'10, Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our algorithm relies only on a small set of primitive operations on strings and thus also applies to the fully-compressed setting (where text and pattern are given as straight-line programs) and to the dynamic setting (where we maintain a collection of strings under creation, splitting, and concatenation), improving over the state of the art.
Cite
@article{arxiv.2204.03087,
title = {Faster Pattern Matching under Edit Distance},
author = {Panagiotis Charalampopoulos and Tomasz Kociumaka and Philip Wellnitz},
journal= {arXiv preprint arXiv:2204.03087},
year = {2022}
}
Comments
94 pages, 7 figures