Related papers: Low-Memory Adaptive Prefix Coding

Worst-case optimal adaptive alphabetic prefix-free coding

We give the first algorithm for adaptive alphabetic prefix-free coding that is worst-case optimal in terms of time and compression when $\sigma \in o \left( \frac{n^{1 / 2}}{\log n} \right)$, where $\sigma$ is the size of the alphabet and…

Data Structures and Algorithms · Computer Science 2026-01-08 Travis Gagie

Worst-Case Optimal Adaptive Prefix Coding

A common complaint about adaptive prefix coding is that it is much slower than static prefix coding. Karpinski and Nekrich recently took an important step towards resolving this: they gave an adaptive Shannon coding algorithm that encodes…

Information Theory · Computer Science 2008-12-18 Travis Gagie , Yakov Nekrich

Efficient and Compact Representations of Prefix Codes

Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…

Data Structures and Algorithms · Computer Science 2015-06-30 Travis Gagie , Gonzalo Navarro , Yakov Nekrich , Alberto Ordóñez

Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes

For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in…

Data Structures and Algorithms · Computer Science 2021-04-02 Antonio Fariña , Travis Gagie , Szymon Grabowski , Giovanni Manzini , Gonzalo Navarro , Alberto Ordóñez

Fast and Compact Prefix Codes

It is well-known that, given a probability distribution over $n$ characters, in the worst case it takes (\Theta (n \log n)) bits to store a prefix code with minimum expected codeword length. However, in this paper we first show that, for…

Data Structures and Algorithms · Computer Science 2009-05-20 Travis Gagie , Gonzalo Navarro , Yakov Nekrich

A nearly tight memory-redundancy trade-off for one-pass compression

Let $s$ be a string of length $n$ over an alphabet of constant size $\sigma$ and let $c$ and $\epsilon$ be constants with (1 \geq c \geq 0) and (\epsilon > 0). Using (O (n)) time, (O (n^c)) bits of memory and one pass we can always encode…

Information Theory · Computer Science 2007-08-15 Travis Gagie

Twenty (or so) Questions: $D$-ary Length-Bounded Prefix Coding

Efficient optimal prefix coding has long been accomplished via the Huffman algorithm. However, there is still room for improvement and exploration regarding variants of the Huffman problem. Length-limited Huffman coding, useful for many…

Information Theory · Computer Science 2007-07-13 Michael B. Baer

Faster Lightweight Lempel-Ziv Parsing

We present an algorithm that computes the Lempel-Ziv decomposition in $O(n(\log\sigma + \log\log n))$ time and $n\log\sigma + \epsilon n$ bits of space, where $\epsilon$ is a constant rational parameter, $n$ is the length of the input…

Data Structures and Algorithms · Computer Science 2015-06-09 Dmitry Kosolobov

Range Predecessor and Lempel-Ziv Parsing

The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of…

Data Structures and Algorithms · Computer Science 2015-07-28 Djamal Belazzougui , Simon J. Puglisi

New Algorithms and Lower Bounds for Sequential-Access Data Compression

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…

Information Theory · Computer Science 2009-02-03 Travis Gagie

Reserved-Length Prefix Coding

Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of…

Information Theory · Computer Science 2008-01-03 Michael B. Baer

Adaptive Learning of Compressible Strings

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm…

Data Structures and Algorithms · Computer Science 2021-10-20 Gabriele Fici , Nicola Prezza , Rossano Venturini

Space-Efficient String Indexing for Wildcard Pattern Matching

In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log^{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all…

Data Structures and Algorithms · Computer Science 2014-01-06 Moshe Lewenstein , Yakov Nekrich , Jeffrey Scott Vitter

More Efficient Algorithms and Analyses for Unequal Letter Cost Prefix-Free Coding

There is a large literature devoted to the problem of finding an optimal (min-cost) prefix-free code with an unequal letter-cost encoding alphabet of size. While there is no known polynomial time algorithm for solving it optimally there are…

Information Theory · Computer Science 2007-07-13 Mordecai Golin , Li Jian

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

We consider the problem of encoding a string of length $n$ from an integer alphabet of size $\sigma$ so that access and substring equality queries (that is, determining the equality of any two substrings) can be answered efficiently. Any…

Data Structures and Algorithms · Computer Science 2020-05-12 Nicola Prezza

Longest Common Prefixes with $k$-Errors and Applications

Although real-world text datasets, such as DNA sequences, are far from being uniformly random, average-case string searching algorithms perform significantly better than worst-case ones in most applications of interest. In this paper, we…

Data Structures and Algorithms · Computer Science 2018-01-16 Lorraine A. K. Ayad , Panagiotis Charalampopoulos , Costas S. Iliopoulos , Solon P. Pissis

An Encoding for Order-Preserving Matching

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of…

Data Structures and Algorithms · Computer Science 2017-02-21 Travis Gagie , Giovanni Manzini , Rossano Venturini

Relations Between Greedy and Bit-Optimal LZ77 Encodings

This paper investigates the size in bits of the LZ77 encoding, which is the most popular and efficient variant of the Lempel-Ziv encodings used in data compression. We prove that, for a wide natural class of variable-length encoders for…

Discrete Mathematics · Computer Science 2018-01-10 Dmitry Kosolobov

Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\log\sigma)$ bits of space, where $n$ is the string length and $\sigma$ is the alphabet size.…

Data Structures and Algorithms · Computer Science 2016-11-15 J. Ian Munro , Gonzalo Navarro , Yakov Nekrich

Space-Efficient Huffman Codes Revisited

Canonical Huffman code is an optimal prefix-free compression code whose codewords enumerated in the lexicographical order form a list of binary words in non-decreasing lengths. Gagie et al. (2015) gave a representation of this coding…

Data Structures and Algorithms · Computer Science 2021-08-19 Szymon Grabowski , Dominik Köppl