Related papers: Re-Pair In Small Space
Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let $n$, $\sigma$, and $d$ be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original…
Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable…
Re-Pair is an efficient grammar compressor that operates by recursively replacing high-frequency character pairs with new grammar symbols. The most space-efficient linear-time algorithm computing Re-Pair uses $(1+\epsilon)n+\sqrt n$ words…
The goal of grammar compression is to construct a small sized context free grammar which uniquely generates the input text data. Among grammar compression methods, RePair is known for its good practical compression performance. MR-RePair…
We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the…
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers.…
Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a…
The compression is an important topic in computer science which allows we to storage more amount of data on our data storage. There are several techniques to compress any file. In this manuscript will be described the most important…
We present OnPair, a dictionary-based compression algorithm designed to meet the needs of in-memory database systems that require both high compression and fast random access. Existing methods either achieve strong compression ratios at…
Grammar-based compression is a loss-less data compression scheme that represents a given string $w$ by a context-free grammar that generates only $w$. While computing the smallest grammar which generates a given string $w$ is NP-hard in…
Grammar compression is a general compression framework in which a string $T$ of length $N$ is represented as a context-free grammar of size $n$ whose language contains only $T$. In this paper, we focus on studying the limitations of…
In this paper we present an application of a simple technique of local recompression, previously developed by the author in the context of compressed membership problems and compressed pattern matching, to word equations. The technique is…
In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string.…
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…
Neighborhood queries and triple queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs…
Grammar compression represents a string as a context free grammar. Achieving compression requires encoding such grammar as a binary string; there are a few commonly used encodings. We bound the size of practically used encodings for several…
Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated…
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern…
In this work we introduce a new linear time compression algorithm, called "Re-pair for Trees", which compresses ranked ordered trees using linear straight-line context-free tree grammars. Such grammars generalize straight-line context-free…
We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size $n$ compressing a string of size $N$ and a pattern string of size $m$ over an alphabet of size $\sigma$, our algorithm uses…