Related papers: Fast and Compact Prefix Codes

Efficient and Compact Representations of Prefix Codes

Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…

Data Structures and Algorithms · Computer Science 2015-06-30 Travis Gagie , Gonzalo Navarro , Yakov Nekrich , Alberto Ordóñez

Efficient and Compact Representations of Some Non-Canonical Prefix-Free Codes

For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in…

Data Structures and Algorithms · Computer Science 2021-04-02 Antonio Fariña , Travis Gagie , Szymon Grabowski , Giovanni Manzini , Gonzalo Navarro , Alberto Ordóñez

Low-Memory Adaptive Prefix Coding

In this paper we study the adaptive prefix coding problem in cases where the size of the input alphabet is large. We present an online prefix coding algorithm that uses $O(\sigma^{1 / \lambda + \epsilon}) $ bits of space for any constants…

Data Structures and Algorithms · Computer Science 2008-11-24 Travis Gagie , Marek Karpinski , Yakov Nekrich

Finding Short Synchronizing Words for Prefix Codes

We study the problems of finding a shortest synchronizing word and its length for a given prefix code. This is done in two different settings: when the code is defined by an arbitrary decoder recognizing its star and when the code is…

Formal Languages and Automata Theory · Computer Science 2018-06-19 Andrew Ryzhikov , Marek Szykuła

Worst-Case Optimal Adaptive Prefix Coding

A common complaint about adaptive prefix coding is that it is much slower than static prefix coding. Karpinski and Nekrich recently took an important step towards resolving this: they gave an adaptive Shannon coding algorithm that encodes…

Information Theory · Computer Science 2008-12-18 Travis Gagie , Yakov Nekrich

A nearly tight memory-redundancy trade-off for one-pass compression

Let $s$ be a string of length $n$ over an alphabet of constant size $\sigma$ and let $c$ and $\epsilon$ be constants with (1 \geq c \geq 0) and (\epsilon > 0). Using (O (n)) time, (O (n^c)) bits of memory and one pass we can always encode…

Information Theory · Computer Science 2007-08-15 Travis Gagie

An Encoding for Order-Preserving Matching

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of…

Data Structures and Algorithms · Computer Science 2017-02-21 Travis Gagie , Giovanni Manzini , Rossano Venturini

Prefix Codes for Power Laws with Countable Support

In prefix coding over an infinite alphabet, methods that consider specific distributions generally consider those that decline more quickly than a power law (e.g., Golomb coding). Particular power-law distributions, however, model many…

Information Theory · Computer Science 2009-03-06 Michael B. Baer

Breaking the $O(n)$-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees

The suffix array and the suffix tree are the two most fundamental data structures for string processing. For a length-$n$ text, however, they use $\Theta(n \log n)$ bits of space, which is often too costly. To address this, Grossi and…

Data Structures and Algorithms · Computer Science 2023-04-20 Dominik Kempa , Tomasz Kociumaka

Space-Efficient String Indexing for Wildcard Pattern Matching

In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log^{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all…

Data Structures and Algorithms · Computer Science 2014-01-06 Moshe Lewenstein , Yakov Nekrich , Jeffrey Scott Vitter

Wee LCP

We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer…

Data Structures and Algorithms · Computer Science 2010-02-19 Johannes Fischer

Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct

A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the…

Data Structures and Algorithms · Computer Science 2016-09-30 Ahmed Belal , Amr Elmasry

Optimal compression of hash-origin prefix trees

There is a common problem of operating on hash values of elements of some database. In this paper there will be analyzed informational content of such general task and how to practically approach such found lower boundaries. Minimal prefix…

Information Theory · Computer Science 2012-07-10 Jarek Duda

Compressed Index with Construction in Compressed Space

Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…

Data Structures and Algorithms · Computer Science 2026-04-15 Dmitry Kosolobov

A fast and simple $O (z \log n)$-space index for finding approximately longest common substrings

We describe how, given a text $T [1..n]$ and a positive constant $\epsilon$, we can build a simple $O (z \log n)$-space index, where $z$ is the number of phrases in the LZ77 parse of $T$, such that later, given a pattern $P [1..m]$, in $O…

Data Structures and Algorithms · Computer Science 2022-12-06 Nick Fagan , Jorge Hermo González , Travis Gagie

Run Compressed Rank/Select for Large Alphabets

Given a string of length $n$ that is composed of $r$ runs of letters from the alphabet $\{0,1,\ldots,\sigma{-}1\}$ such that $2 \le \sigma \le r$, we describe a data structure that, provided $r \le n / \log^{\omega(1)} n$, stores the string…

Data Structures and Algorithms · Computer Science 2018-02-27 José Fuentes-Sepúlveda , Juha Kärkkäinen , Dmitry Kosolobov , Simon J. Puglisi

Reserved-Length Prefix Coding

Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of…

Information Theory · Computer Science 2008-01-03 Michael B. Baer

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set $S$ of $k$ strings of total length $n$ in $n \log\sigma + O(k \log n)$ bits of space and supports fast pattern matching queries…

Data Structures and Algorithms · Computer Science 2017-10-11 Takuya Takagi , Shunsuke Inenaga , Kunihiko Sadakane , Hiroki Arimura

Worst-case optimal adaptive alphabetic prefix-free coding

We give the first algorithm for adaptive alphabetic prefix-free coding that is worst-case optimal in terms of time and compression when $\sigma \in o \left( \frac{n^{1 / 2}}{\log n} \right)$, where $\sigma$ is the size of the alphabet and…

Data Structures and Algorithms · Computer Science 2026-01-08 Travis Gagie

About Optimal Prefix Codes over Countably Infinite Alphabets: Probabilistic Intervals for the Codeword Lengths Assignment

For the discrete memoryless sources with a countably infinite alphabet, we prove that for any positive integer $k$, there exists a corresponding probability interval such that if the largest symbol probability $p_{1}$ falls in this…

Information Theory · Computer Science 2026-04-21 Hongyang Liu , Wei Yan