Improved Extended Regular Expression Matching
Abstract
An extended regular expression specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression and a string , the extended regular expression matching problem is to decide if matches any of the strings specified by . Extended regular expression matching was introduced by Hopcroft and Ullman in the 1970s, who gave a simple dynamic programming solution using time and space, where is the length of and is the length of . The current state-of-the art solution, by Yamamoto and Miyazaki uses time and space, where is the number of negation and complement operators in and is the number of bits in a machine word. This roughly replaces the factor with in the dominant terms of both the space and time bounds of the classical Hopcroft and Ullman algorithm. In this paper, we present a new solution that solves extended regular expression matching in time and space, where is the exponent of matrix multiplication. Essentially, this replaces the dominant term with in the time bound, while simultaneously improving the term in the space to . To achieve our result, we develop several new insights and techniques of independent interest, including a new compact representation to store and efficiently combine substring matches, a new clustering technique for parse trees of extended regular expressions, and a new efficient combination of finite automaton simulation with substring match representation to speed up the classic dynamic programming solution.
Keywords
Cite
@article{arxiv.2510.09311,
title = {Improved Extended Regular Expression Matching},
author = {Philip Bille and Inge Li Gørtz and Rikke Schjeldrup Jessen},
journal= {arXiv preprint arXiv:2510.09311},
year = {2026}
}