English

Improved Extended Regular Expression Matching

Data Structures and Algorithms 2026-02-09 v2

Abstract

An extended regular expression RR specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression RR and a string QQ, the extended regular expression matching problem is to decide if QQ matches any of the strings specified by RR. Extended regular expression matching was introduced by Hopcroft and Ullman in the 1970s, who gave a simple dynamic programming solution using O(n3m)O(n^3m) time and O(n2m)O(n^2m) space, where nn is the length of QQ and mm is the length of RR. The current state-of-the art solution, by Yamamoto and Miyazaki uses O(n3k+n2mw+n+m)O(\frac{n^3k + n^2m}{w} + n + m) time and O(n2k+nmw+n+m)O(\frac{n^2k + nm}{w} + n + m) space, where kk is the number of negation and complement operators in RR and ww is the number of bits in a machine word. This roughly replaces the mm factor with kk in the dominant terms of both the space and time bounds of the classical Hopcroft and Ullman algorithm. In this paper, we present a new solution that solves extended regular expression matching in O(nωk+n2mmax(w/logw,logn)+m) O\left(n^\omega k + \frac{n^2m}{\max(w/\log w, \log n)} + m\right) time and O(n2logkw+n+m)=O(n2+m)O(\frac{n^2 \log k}{w} + n + m) = O(n^2 +m) space, where ω2.3716\omega \approx 2.3716 is the exponent of matrix multiplication. Essentially, this replaces the dominant n3kn^3k term with nωkn^\omega k in the time bound, while simultaneously improving the n2kn^2k term in the space to O(n2)O(n^2). To achieve our result, we develop several new insights and techniques of independent interest, including a new compact representation to store and efficiently combine substring matches, a new clustering technique for parse trees of extended regular expressions, and a new efficient combination of finite automaton simulation with substring match representation to speed up the classic dynamic programming solution.

Keywords

Cite

@article{arxiv.2510.09311,
  title  = {Improved Extended Regular Expression Matching},
  author = {Philip Bille and Inge Li Gørtz and Rikke Schjeldrup Jessen},
  journal= {arXiv preprint arXiv:2510.09311},
  year   = {2026}
}