Re-Pair In Small Space
Data Structures and Algorithms
2019-11-19 v3
Abstract
Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length whose characters are drawn from an integer alphabet, an time algorithm computing Re-Pair in bits of space including the text space, where is the number of terminals and non-terminals. The algorithm works in the restore model, supporting the recovery of the original input in the time for the Re-Pair computation with additional bits of working space. We give variants of our solution working in parallel or in the external memory model.
Cite
@article{arxiv.1908.04933,
title = {Re-Pair In Small Space},
author = {Dominik Köppl and Tomohiro I and Isamu Furuya and Yoshimasa Takabatake and Kensuke Sakai and Keisuke Goto},
journal= {arXiv preprint arXiv:1908.04933},
year = {2019}
}