Related papers: Entropy coding with Variable Length Re-writing Sys…
Adaptive variable-length codes associate a variable-length codeword to the symbol being encoded depending on the previous symbols in the input string. This class of codes has been recently presented in [Dragos Trinca, arXiv:cs.DS/0505007]…
In this paper, we propose a source coding scheme that represents data from unknown distributions through frequency and support information. Existing encoding schemes often compress data by sacrificing computational efficiency or by assuming…
This paper investigates the problem of variable-length lossy source coding allowing a positive excess distortion probability and an overflow probability of codeword lengths. Novel one-shot achievability and converse bounds of the optimal…
We consider universal variable-to-fixed length compression of memoryless sources with a fidelity criterion. We design a dictionary codebook over the reproduction alphabet which is used to parse the source stream. Once a source subsequence…
Non-uniquely decodable codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non-prefix-free codes, where a codeword can be a prefix of other(s), and…
Learning, prediction, and compression are intimately connected: a model that accurately predicts the next symbol in a sequence can be coupled with a source coder to compress that sequence near its information-theoretic limit. When tokenized…
Many data compressors regularly encode probability distributions for entropy coding - requiring minimal description length type of optimizations. Canonical prefix/Huffman coding usually just writes lengths of bit sequences, this way…
This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of…
We present shuffle coding, a general method for optimal compression of sequences of unordered objects using bits-back coding. Data structures that can be compressed using shuffle coding include multisets, graphs, hypergraphs, and others. We…
This article shows that any type of binary data can be defined as a collection from codewords of variable length. This feature helps us to define an Injective and surjective function from the suggested codewords to the required codewords.…
This paper presents new lower and upper bounds for the compression rate of binary prefix codes optimized over memoryless sources according to various nonlinear codeword length objectives. Like the most well-known redundancy bounds for…
The Run Length Encoding (RLE) compression method is a long standing simple lossless compression scheme which is easy to implement and achieves a good compression on input data which contains repeating consecutive symbols. In its pure form…
Text compression schemes and compact data structures usually combine sophisticated probability models with basic coding methods whose average codeword length closely match the entropy of known distributions. In the frequent case where basic…
Universal source coding at short blocklengths is considered for an exponential family of distributions. The \emph{Type Size} code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless…
We present a new information-theoretic definition and associated results, based on list decoding in a source coding setting. We begin by presenting list-source codes, which naturally map a key length (entropy) to list size. We then show…
Describes a near-linear-time algorithm for a variant of Huffman coding, in which the letters may have non-uniform lengths (as in Morse code), but with the restriction that each word to be encoded has equal probability. [See also ``Huffman…
Universal variable-to-fixed (V-F) length coding of $d$-dimensional exponential family of distributions is considered. We propose an achievable scheme consisting of a dictionary, used to parse the source output stream, making use of the…
In the Minimum Description Length (MDL) principle, learning from the data is equivalent to an optimal coding problem. We show that the codes that achieve optimal compression in MDL are critical in a very precise sense. First, when they are…
Permutation codes are a class of structured vector quantizers with a computationally-simple encoding procedure based on sorting the scalar components. Using a codebook comprising several permutation codes as subcodes preserves the…
There is a class of entropy-coding methods which do not substitute symbols by code words (such as Huffman coding), but operate on intervals or ranges. This class includes three prominent members: conventional arithmetic coding, range…