English

A Compressed-Gap Data-Aware Measure

Data Structures and Algorithms 2015-05-15 v2

Abstract

In this paper, we consider the problem of efficiently representing a set SS of nn items out of a universe U={0,...,u1}U=\{0,...,u-1\} while supporting a number of operations on it. Let G=g1...gnG=g_1...g_n be the gap stream associated with SS, gapgap its bit-size when encoded with \emph{gap-encoding}, and H0(G)H_0(G) its empirical zero-order entropy. We prove that (1) nH0(G)o(gap)nH_0(G)\in o(gap) if GG is highly compressible, and (2) nH0(G)nlog(u/n)+nuH0(S)nH_0(G) \leq n\log(u/n) + n \leq uH_0(S). Let dd be the number of \emph{distinct} gap lengths between elements in SS. We firstly propose a new space-efficient zero-order compressed representation of SS taking n(H0(G)+1)+O(dlogu)n(H_0(G)+1)+\mathcal O(d\log u) bits of space. Then, we describe a fully-indexable dictionary that supports \emph{rank} and \emph{select} queries in O(log(u/n)+loglogu)\mathcal O(\log(u/n)+\log\log u) time while requiring asymptotically the same space as the proposed compressed representation of SS.

Keywords

Cite

@article{arxiv.1502.03288,
  title  = {A Compressed-Gap Data-Aware Measure},
  author = {Nicola Prezza},
  journal= {arXiv preprint arXiv:1502.03288},
  year   = {2015}
}

Comments

11 pages, 2 tables

R2 v1 2026-06-22T08:27:33.412Z