Sparse Regular Expression Matching

Philip Bille; Inge Li Gørtz

Sparse Regular Expression Matching

Data Structures and Algorithms 2023-11-07 v7

Authors: Philip Bille , Inge Li Gørtz

Abstract

A regular expression specifies a set of strings formed by single characters combined with concatenation, union, and Kleene star operators. Given a regular expression $R$ and a string $Q$ , the regular expression matching problem is to decide if $Q$ matches any of the strings specified by $R$ . Regular expressions are a fundamental concept in formal languages and regular expression matching is a basic primitive for searching and processing data. A standard textbook solution [Thompson, CACM 1968] constructs and simulates a nondeterministic finite automaton, leading to an $O(nm)$ time algorithm, where $n$ is the length of $Q$ and $m$ is the length of $R$ . Despite considerable research efforts only polylogarithmic improvements of this bound are known. Recently, conditional lower bounds provided evidence for this lack of progress when Backurs and Indyk [FOCS 2016] proved that, assuming the strong exponential time hypothesis (SETH), regular expression matching cannot be solved in $O((nm)^{1-\epsilon})$ , for any constant $\epsilon > 0$ . Hence, the complexity of regular expression matching is essentially settled in terms of $n$ and $m$ . In this paper, we take a new approach and introduce a \emph{density} parameter, $\Delta$ , that captures the amount of nondeterminism in the NFA simulation on $Q$ . The density is at most $nm+1$ but can be significantly smaller. Our main result is a new algorithm that solves regular expression matching in $O\left(\Delta \log \log \frac{nm}{\Delta} +n + m\right)$ time. This essentially replaces $nm$ with $\Delta$ in the complexity of regular expression matching. We complement our upper bound by a matching conditional lower bound that proves that we cannot solve regular expression matching in time $O(\Delta^{1-\epsilon})$ for any constant $\epsilon > 0$ assuming SETH.

Keywords

string algorithms succinct data structure computational complexity

Cite

@article{arxiv.1907.04752,
  title  = {Sparse Regular Expression Matching},
  author = {Philip Bille and Inge Li Gørtz},
  journal= {arXiv preprint arXiv:1907.04752},
  year   = {2023}
}

Sparse Regular Expression Matching

Abstract

Keywords

Cite

Related papers