English

Substring Complexities on Run-length Compressed Strings

Data Structures and Algorithms 2022-05-26 v1

Abstract

Let ST(k)S_{T}(k) denote the set of distinct substrings of length kk in a string TT, then the kk-th substring complexity is defined by its cardinality ST(k)|S_{T}(k)|. Recently, δ=max{ST(k)/k:k1}\delta = \max \{ |S_{T}(k)| / k : k \ge 1 \} is shown to be a good compressibility measure of highly-repetitive strings. In this paper, given TT of length nn in the run-length compressed form of size rr, we show that δ\delta can be computed in Csort(r,n)\mathit{C}_{\mathsf{sort}}(r, n) time and O(r)O(r) space, where Csort(r,n)=O(min(rlglgr,rlgrn))\mathit{C}_{\mathsf{sort}}(r, n) = O(\min (r \lg\lg r, r \lg_{r} n)) is the time complexity for sorting rr O(lgn)O(\lg n)-bit integers in O(r)O(r) space in the Word-RAM model with word size Ω(lgn)\Omega(\lg n).

Keywords

Cite

@article{arxiv.2205.12421,
  title  = {Substring Complexities on Run-length Compressed Strings},
  author = {Akiyoshi Kawamoto and Tomohiro I},
  journal= {arXiv preprint arXiv:2205.12421},
  year   = {2022}
}