English

Iterated Straight-Line Programs

Data Structures and Algorithms 2024-02-16 v2

Abstract

We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure δ\delta based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are crucial in areas like Bioinformatics). The extension, called iterated SLPs (ISLPs), allows rules of the form AΠi=k1k2B1ic1BtictA \rightarrow \Pi_{i=k_1}^{k_2} B_1^{i^{c_1}}\cdots B_t^{i^{c_t}}, for which we show how to extract any substring of length λ\lambda, from the represented text T[1..n]T[1.. n], in time O(λ+log2nloglogn)O(\lambda + \log^2 n\log\log n). This is the first compressed representation for repetitive texts breaking δ\delta while, at the same time, supporting direct access to arbitrary text symbols in polylogarithmic time. As a byproduct, we extend Ganardi et al.'s technique to balance any SLP (so it has a derivation tree of logarithmic height) to a wide generalization of SLPs, including ISLPs.

Keywords

Cite

@article{arxiv.2402.09232,
  title  = {Iterated Straight-Line Programs},
  author = {Gonzalo Navarro and Cristian Urbina},
  journal= {arXiv preprint arXiv:2402.09232},
  year   = {2024}
}

Comments

This version of the article includes the proofs omitted from LATIN24