Generalised Pattern Matching Revisited

Bartłomiej Dudek; Paweł Gawrychowski; Tatiana Starikovskaya

Generalised Pattern Matching Revisited

Data Structures and Algorithms 2020-01-20 v1

Authors: Bartłomiej Dudek , Paweł Gawrychowski , Tatiana Starikovskaya

Abstract

In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $\Sigma_T$ , a pattern $P$ of length $m$ over an alphabet $\Sigma_P$ , and a matching relationship $\subseteq \Sigma_T \times \Sigma_P$ , and must return all substrings of $T$ that match $P$ (reporting) or the number of mismatches between each substring of $T$ of length $m$ and $P$ (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * $\mathcal{D}\,$ being the maximum number of characters that match a fixed character, * $\mathcal{S}\,$ being the number of pairs of matching characters, * $\mathcal{I}\,$ being the total number of disjoint intervals of characters that match the $m$ characters of the pattern $P$ . At the heart of our new deterministic upper bounds for $\mathcal{D}\,$ and $\mathcal{S}\,$ lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for $\texttt{GPM}$ . We start by showing that any deterministic or Monte Carlo algorithm for $\texttt{GPM}$ must use $\Omega(\mathcal{S})$ time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

Keywords

string algorithms approximation algorithm graph algorithm

Cite

@article{arxiv.2001.05976,
  title  = {Generalised Pattern Matching Revisited},
  author = {Bartłomiej Dudek and Paweł Gawrychowski and Tatiana Starikovskaya},
  journal= {arXiv preprint arXiv:2001.05976},
  year   = {2020}
}

Generalised Pattern Matching Revisited

Abstract

Keywords

Cite

Related papers