Related papers: Fast and Compact Prefix Codes
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…
For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in…
In this paper we study the adaptive prefix coding problem in cases where the size of the input alphabet is large. We present an online prefix coding algorithm that uses $O(\sigma^{1 / \lambda + \epsilon}) $ bits of space for any constants…
We study the problems of finding a shortest synchronizing word and its length for a given prefix code. This is done in two different settings: when the code is defined by an arbitrary decoder recognizing its star and when the code is…
A common complaint about adaptive prefix coding is that it is much slower than static prefix coding. Karpinski and Nekrich recently took an important step towards resolving this: they gave an adaptive Shannon coding algorithm that encodes…
Let $s$ be a string of length $n$ over an alphabet of constant size $\sigma$ and let $c$ and $\epsilon$ be constants with (1 \geq c \geq 0) and (\epsilon > 0). Using (O (n)) time, (O (n^c)) bits of memory and one pass we can always encode…
Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of…
In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many…
The suffix array and the suffix tree are the two most fundamental data structures for string processing. For a length-$n$ text, however, they use $\Theta(n \log n)$ bits of space, which is often too costly. To address this, Grossi and…
In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log^{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all…
We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer…
A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the…
There is a common problem of operating on hash values of elements of some database. In this paper there will be analyzed informational content of such general task and how to practically approach such found lower boundaries. Minimal prefix…
Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…
We describe how, given a text $T [1..n]$ and a positive constant $\epsilon$, we can build a simple $O (z \log n)$-space index, where $z$ is the number of phrases in the LZ77 parse of $T$, such that later, given a pattern $P [1..m]$, in $O…
Given a string of length $n$ that is composed of $r$ runs of letters from the alphabet $\{0,1,\ldots,\sigma{-}1\}$ such that $2 \le \sigma \le r$, we describe a data structure that, provided $r \le n / \log^{\omega(1)} n$, stores the string…
Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of…
In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set $S$ of $k$ strings of total length $n$ in $n \log\sigma + O(k \log n)$ bits of space and supports fast pattern matching queries…
We give the first algorithm for adaptive alphabetic prefix-free coding that is worst-case optimal in terms of time and compression when $\sigma \in o \left( \frac{n^{1 / 2}}{\log n} \right)$, where $\sigma$ is the size of the alphabet and…
For the discrete memoryless sources with a countably infinite alphabet, we prove that for any positive integer $k$, there exists a corresponding probability interval such that if the largest symbol probability $p_{1}$ falls in this…