Related papers: Space-efficient conversions from SLPs
A Straight-Line Program (SLP) $G$ for a string $T$ is a context-free grammar (CFG) that derives $T$ only, which can be considered as a compressed representation of $T$. In this paper, we show how to encode $G$ in $n \lceil \lg N \rceil + (n…
We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string.…
Consider a text $T [1..n]$ prefixed by a reference sequence $R = T [1..\ell]$. We show how, given $R$ and the $z'$-phrase relative Lempel-Ziv parse of $T [\ell + 1..n]$ with respect to $R$, we can build the LZ77 parse of $T$ in…
In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this…
We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a…
It was recently proved that any SLP generating a given string $w$ can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with…
We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in $O(N\log N)$ time and uses only $O(N\log\sigma)$ bits of working space, where $N$ is the length of the string and $\sigma$ is the size of…
The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of…
In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string.…
We propose a new approach for calculating the Lempel-Ziv factorization of a string, based on run length encoding (RLE). We present a conceptually simple off-line algorithm based on a variant of suffix arrays, as well as an on-line algorithm…
We consider the problem of decompressing the Lempel--Ziv 77 representation of a string $S$ of length $n$ using a working space as close as possible to the size $z$ of the input. The folklore solution for the problem runs in $O(n)$ time but…
We present an algorithm that computes the Lempel-Ziv decomposition in $O(n(\log\sigma + \log\log n))$ time and $n\log\sigma + \epsilon n$ bits of space, where $\epsilon$ is a constant rational parameter, $n$ is the length of the input…
It was recently proved that any Straight-Line Program (SLP) generating a given string can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We generalize this proof to a general class of grammars we…
Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows-Wheeler transform (RLBWT), one of the most flexible approaches to representing and processing highly compressible strings. The main idea is to represent a text as a…
We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length $N$ in linear time, that utilizes only $N\log N + O(1)$ bits of working space, i.e., a single integer array, for constant size integer…
Lempel-Ziv (LZ77) factorization is a fundamental problem in string processing: Greedily partition a given string $T$ from left to right into blocks (called phrases) so that each phrase is either the leftmost occurrence of a letter or the…
We describe how, given a text $T [1..n]$ and a positive constant $\epsilon$, we can build a simple $O (z \log n)$-space index, where $z$ is the number of phrases in the LZ77 parse of $T$, such that later, given a pattern $P [1..m]$, in $O…
In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size…
To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on grammars. In this paper we show how, given a straight-line program with $r$ rules for a string (S [1..n]) whose…
For both the Lempel Ziv 77- and 78-factorization we propose algorithms generating the respective factorization using $(1+\epsilon) n \lg n + O(n)$ bits (for any positive constant $\epsilon \le 1$) working space (including the space for the…