English

Linear-Space Substring Range Counting over Polylogarithmic Alphabets

Data Structures and Algorithms 2012-02-16 v1

Abstract

Bille and G{\o}rtz (2011) recently introduced the problem of substring range counting, for which we are asked to store compactly a string SS of nn characters with integer labels in ([0, u]), such that later, given an interval ([a, b]) and a pattern PP of length mm, we can quickly count the occurrences of PP whose first characters' labels are in ([a, b]). They showed how to store SS in \Ohnlogn/loglogn\Oh{n \log n / \log \log n} space and answer queries in \Ohm+loglogu\Oh{m + \log \log u} time. We show that, if SS is over an alphabet of size (\polylog (n)), then we can achieve optimal linear space. Moreover, if (u = n \polylog (n)), then we can also reduce the time to \Ohm\Oh{m}. Our results give linear space and time bounds for position-restricted substring counting and the counting versions of indexing substrings with intervals, indexing substrings with gaps and aligned pattern matching.

Keywords

Cite

@article{arxiv.1202.3208,
  title  = {Linear-Space Substring Range Counting over Polylogarithmic Alphabets},
  author = {Travis Gagie and Paweł Gawrychowski},
  journal= {arXiv preprint arXiv:1202.3208},
  year   = {2012}
}
R2 v1 2026-06-21T20:19:34.089Z