Re-Pair In Small Space

Dominik Köppl; Tomohiro I; Isamu Furuya; Yoshimasa Takabatake; Kensuke Sakai; Keisuke Goto

Re-Pair In Small Space

Data Structures and Algorithms 2019-11-19 v3

Authors: Dominik Köppl , Tomohiro I , Isamu Furuya , Yoshimasa Takabatake , Kensuke Sakai , Keisuke Goto

Abstract

Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an $O(n^2) \cap O(n^2 \lg \log_\tau n \lg \lg \lg n / \log_\tau n)$ time algorithm computing Re-Pair in $n \lg \max(n,\tau)$ bits of space including the text space, where $\tau$ is the number of terminals and non-terminals. The algorithm works in the restore model, supporting the recovery of the original input in the time for the Re-Pair computation with $O(\lg n)$ additional bits of working space. We give variants of our solution working in parallel or in the external memory model.

Keywords

succinct data structure string algorithms graph algorithm

Cite

@article{arxiv.1908.04933,
  title  = {Re-Pair In Small Space},
  author = {Dominik Köppl and Tomohiro I and Isamu Furuya and Yoshimasa Takabatake and Kensuke Sakai and Keisuke Goto},
  journal= {arXiv preprint arXiv:1908.04933},
  year   = {2019}
}

Re-Pair In Small Space

Abstract

Keywords

Cite

Related papers