Related papers: Compressing Multisets with Large Alphabets using B…
This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…
We present a one-shot method for compressing large labeled graphs called Random Edge Coding. When paired with a parameter-free model based on P\'olya's Urn, the worst-case computational and memory complexities scale quasi-linearly and…
Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…
Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and…
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…
In language processing, transformers benefit greatly from text being condensed. This is achieved through a larger vocabulary that captures word fragments instead of plain characters. This is often done with Byte Pair Encoding. In the…
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very…
The bits-back argument suggests that latent variable models can be turned into lossless compression schemes. Translating the bits-back argument into efficient and practical lossless compression schemes for general latent variable models,…
We study the problem of compressing a source sequence in the presence of side-information that is related to the source via insertions, deletions and substitutions. We propose a simple algorithm to compress the source sequence when the…
When used with deep learning, the symbolic music modality is often coupled with language model architectures. To do so, the music needs to be tokenized, i.e. converted into a sequence of discrete tokens. This can be achieved by different…
Likelihood-based generative models are the backbones of lossless compression due to the guaranteed existence of codes with lengths close to negative log likelihood. However, there is no guaranteed existence of computationally efficient…
With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or…
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super…
Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications such as compression of natural language text, speech and images. The classic perception of most commonly used methods is that a…
Current methods for compressing neural network weights, such as decomposition, pruning, quantization, and channel simulation, often overlook the inherent symmetries within these networks and thus waste bits on encoding redundant…
This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…
Deep latent variable models have seen recent success in many data domains. Lossless compression is an application of these models which, despite having the potential to be highly useful, has yet to be implemented in a practical manner. We…
One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps. In this paper we propose to learn to compress the map representation such that it is optimal for the…
Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is…
While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an…