Related papers: Practical and Effective Re-Pair Compression
Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let $n$, $\sigma$, and $d$ be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original…
Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a…
Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable…
The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair…
We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the…
Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a…
Grammar-based compression is a loss-less data compression scheme that represents a given string $w$ by a context-free grammar that generates only $w$. While computing the smallest grammar which generates a given string $w$ is NP-hard in…
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…
The compression is an important topic in computer science which allows we to storage more amount of data on our data storage. There are several techniques to compress any file. In this manuscript will be described the most important…
Grammar compression is a general compression framework in which a string $T$ of length $N$ is represented as a context-free grammar of size $n$ whose language contains only $T$. In this paper, we focus on studying the limitations of…
Grammar compression represents a string as a context free grammar. Achieving compression requires encoding such grammar as a binary string; there are a few commonly used encodings. We bound the size of practically used encodings for several…
We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at…
Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated…
In this paper we present an application of a simple technique of local recompression, previously developed by the author in the context of compressed membership problems and compressed pattern matching, to word equations. The technique is…
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern…
We present a new graph compressor that works by recursively detecting repeated substructures and representing them through grammar rules. We show that for a large number of graphs the compressor obtains smaller representations than other…
In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string.…
Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…
In this work we introduce a new linear time compression algorithm, called "Re-pair for Trees", which compresses ranked ordered trees using linear straight-line context-free tree grammars. Such grammars generalize straight-line context-free…
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…