Generalised Pattern Matching Revisited
Abstract
In the problem of [STOC'94, Muthukrishnan and Palem], we are given a text of length over an alphabet , a pattern of length over an alphabet , and a matching relationship , and must return all substrings of that match (reporting) or the number of mismatches between each substring of of length and (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * being the maximum number of characters that match a fixed character, * being the number of pairs of matching characters, * being the total number of disjoint intervals of characters that match the characters of the pattern . At the heart of our new deterministic upper bounds for and lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for . We start by showing that any deterministic or Monte Carlo algorithm for must use time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.
Cite
@article{arxiv.2001.05976,
title = {Generalised Pattern Matching Revisited},
author = {Bartłomiej Dudek and Paweł Gawrychowski and Tatiana Starikovskaya},
journal= {arXiv preprint arXiv:2001.05976},
year = {2020}
}