Improved Extended Regular Expression Matching

Philip Bille; Inge Li Gørtz; Rikke Schjeldrup Jessen

Improved Extended Regular Expression Matching

Data Structures and Algorithms 2026-02-09 v2

Authors: Philip Bille , Inge Li Gørtz , Rikke Schjeldrup Jessen

Abstract

An extended regular expression $R$ specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression $R$ and a string $Q$ , the extended regular expression matching problem is to decide if $Q$ matches any of the strings specified by $R$ . Extended regular expression matching was introduced by Hopcroft and Ullman in the 1970s, who gave a simple dynamic programming solution using $O(n^3m)$ time and $O(n^2m)$ space, where $n$ is the length of $Q$ and $m$ is the length of $R$ . The current state-of-the art solution, by Yamamoto and Miyazaki uses $O(\frac{n^3k + n^2m}{w} + n + m)$ time and $O(\frac{n^2k + nm}{w} + n + m)$ space, where $k$ is the number of negation and complement operators in $R$ and $w$ is the number of bits in a machine word. This roughly replaces the $m$ factor with $k$ in the dominant terms of both the space and time bounds of the classical Hopcroft and Ullman algorithm. In this paper, we present a new solution that solves extended regular expression matching in $O\left(n^\omega k + \frac{n^2m}{\max(w/\log w, \log n)} + m\right)$ time and $O(\frac{n^2 \log k}{w} + n + m) = O(n^2 +m)$ space, where $\omega \approx 2.3716$ is the exponent of matrix multiplication. Essentially, this replaces the dominant $n^3k$ term with $n^\omega k$ in the time bound, while simultaneously improving the $n^2k$ term in the space to $O(n^2)$ . To achieve our result, we develop several new insights and techniques of independent interest, including a new compact representation to store and efficiently combine substring matches, a new clustering technique for parse trees of extended regular expressions, and a new efficient combination of finite automaton simulation with substring match representation to speed up the classic dynamic programming solution.

Keywords

string algorithms

Cite

@article{arxiv.2510.09311,
  title  = {Improved Extended Regular Expression Matching},
  author = {Philip Bille and Inge Li Gørtz and Rikke Schjeldrup Jessen},
  journal= {arXiv preprint arXiv:2510.09311},
  year   = {2026}
}

Improved Extended Regular Expression Matching

Abstract

Keywords

Cite

Related papers