Sparse Regular Expression Matching
Abstract
A regular expression specifies a set of strings formed by single characters combined with concatenation, union, and Kleene star operators. Given a regular expression and a string , the regular expression matching problem is to decide if matches any of the strings specified by . Regular expressions are a fundamental concept in formal languages and regular expression matching is a basic primitive for searching and processing data. A standard textbook solution [Thompson, CACM 1968] constructs and simulates a nondeterministic finite automaton, leading to an time algorithm, where is the length of and is the length of . Despite considerable research efforts only polylogarithmic improvements of this bound are known. Recently, conditional lower bounds provided evidence for this lack of progress when Backurs and Indyk [FOCS 2016] proved that, assuming the strong exponential time hypothesis (SETH), regular expression matching cannot be solved in , for any constant . Hence, the complexity of regular expression matching is essentially settled in terms of and . In this paper, we take a new approach and introduce a \emph{density} parameter, , that captures the amount of nondeterminism in the NFA simulation on . The density is at most but can be significantly smaller. Our main result is a new algorithm that solves regular expression matching in time. This essentially replaces with in the complexity of regular expression matching. We complement our upper bound by a matching conditional lower bound that proves that we cannot solve regular expression matching in time for any constant assuming SETH.
Cite
@article{arxiv.1907.04752,
title = {Sparse Regular Expression Matching},
author = {Philip Bille and Inge Li Gørtz},
journal= {arXiv preprint arXiv:1907.04752},
year = {2023}
}