English

Internal Dictionary Matching

Data Structures and Algorithms 2019-09-26 v1

Abstract

We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary D\mathcal{D} in fragments of a given string TT of length nn. The dictionary is internal in the sense that each pattern in D\mathcal{D} is given as a fragment of TT. This way, D\mathcal{D} takes space proportional to the number of patterns d=Dd=|\mathcal{D}| rather than their total length, which could be Θ(nd)\Theta(n\cdot d). In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from D\mathcal{D} in a fragment T[i..j]T[i..j] and reporting distinct patterns from D\mathcal{D} that occur in T[i..j]T[i..j]. We show how to construct, in O((n+d)logO(1)n)\mathcal{O}((n+d) \log^{\mathcal{O}(1)} n) time, a data structure that answers each of these queries in time O(logO(1)n+output)\mathcal{O}(\log^{\mathcal{O}(1)} n+|output|). The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight---up to subpolynomial factors---upper and lower bounds for the case of a dynamic dictionary.

Keywords

Cite

@article{arxiv.1909.11577,
  title  = {Internal Dictionary Matching},
  author = {Panagiotis Charalampopoulos and Tomasz Kociumaka and Manal Mohamed and Jakub Radoszewski and Wojciech Rytter and Tomasz Waleń},
  journal= {arXiv preprint arXiv:1909.11577},
  year   = {2019}
}

Comments

A short version of this paper was accepted for presentation at ISAAC 2019

R2 v1 2026-06-23T11:25:39.622Z