Related papers: Compressing Multisets with Large Alphabets using B…

Compressing Sets and Multisets of Sequences

This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…

Information Theory · Computer Science 2014-01-27 Christian Steinruecken

Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs

We present a one-shot method for compressing large labeled graphs called Random Edge Coding. When paired with a parameter-free model based on P\'olya's Urn, the worst-case computational and memory complexities scale quasi-linearly and…

Machine Learning · Computer Science 2023-05-18 Daniel Severo , James Townsend , Ashish Khisti , Alireza Makhzani

Compressing combinatorial objects

Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…

Information Theory · Computer Science 2016-01-15 Christian Steinruecken

Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and…

Machine Learning · Computer Science 2021-06-16 Yangjun Ruan , Karen Ullrich , Daniel Severo , James Townsend , Ashish Khisti , Arnaud Doucet , Alireza Makhzani , Chris J. Maddison

Efficient and Compact Representations of Prefix Codes

Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…

Data Structures and Algorithms · Computer Science 2015-06-30 Travis Gagie , Gonzalo Navarro , Yakov Nekrich , Alberto Ordóñez

Multidimensional Byte Pair Encoding: Shortened Sequences for Improved Visual Data Generation

In language processing, transformers benefit greatly from text being condensed. This is achieved through a larger vocabulary that captures word fragments instead of plain characters. This is often done with Byte Pair Encoding. In the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-18 Tim Elsner , Paula Usinger , Julius Nehring-Wirxel , Gregor Kobsik , Victor Czech , Yanjiang He , Isaak Lim , Leif Kobbelt

Fast Recursive Coding Based on Grouping of Symbols

A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very…

Information Theory · Computer Science 2007-08-22 Nikolay Ponomarenko , Vladimir Lukin , Karen Egiazarian , Jaakko Astola , Boris Y Ryabko

Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables

The bits-back argument suggests that latent variable models can be turned into lossless compression schemes. Translating the bits-back argument into efficient and practical lossless compression schemes for general latent variable models,…

Machine Learning · Computer Science 2019-10-03 Friso H. Kingma , Pieter Abbeel , Jonathan Ho

A Compression Algorithm Using Mis-aligned Side-information

We study the problem of compressing a source sequence in the presence of side-information that is related to the source via insertions, deletions and substitutions. We propose a simple algorithm to compress the source sequence when the…

Information Theory · Computer Science 2016-11-15 Nan Ma , Kannan Ramchandran , David Tse

Byte Pair Encoding for Symbolic Music

When used with deep learning, the symbolic music modality is often coupled with language model architectures. To do so, the music needs to be tokenized, i.e. converted into a sequence of discrete tokens. This can be achieved by different…

Machine Learning · Computer Science 2023-11-14 Nathan Fradet , Nicolas Gutowski , Fabien Chhel , Jean-Pierre Briot

Compression with Flows via Local Bits-Back Coding

Likelihood-based generative models are the backbones of lossless compression due to the guaranteed existence of codes with lengths close to negative log likelihood. However, there is no guaranteed existence of computationally efficient…

Machine Learning · Computer Science 2020-01-07 Jonathan Ho , Evan Lohn , Pieter Abbeel

MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or…

Computer Vision and Pattern Recognition · Computer Science 2024-04-18 Chunyi Li , Guo Lu , Donghui Feng , Haoning Wu , Zicheng Zhang , Xiaohong Liu , Guangtao Zhai , Weisi Lin , Wenjun Zhang

Fast Codes for Large Alphabets

We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super…

Information Theory · Computer Science 2007-07-13 Boris Ryabko , Jaakko Astola , Karen Egiazarian

Large Alphabet Source Coding using Independent Component Analysis

Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications such as compression of natural language text, speech and images. The classic perception of most commonly used methods is that a…

Information Theory · Computer Science 2016-07-26 Amichai Painsky , Saharon Rosset , Meir Feder

Getting Free Bits Back from Rotational Symmetries in LLMs

Current methods for compressing neural network weights, such as decomposition, pruning, quantization, and channel simulation, often overlook the inherent symmetries within these networks and thus waste bits on encoding redundant…

Information Theory · Computer Science 2024-10-03 Jiajun He , Gergely Flamich , José Miguel Hernández-Lobato

New Algorithms and Lower Bounds for Sequential-Access Data Compression

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…

Information Theory · Computer Science 2009-02-03 Travis Gagie

Practical Lossless Compression with Latent Variables using Bits Back Coding

Deep latent variable models have seen recent success in many data domains. Lossless compression is an application of these models which, despite having the potential to be highly useful, has yet to be implemented in a practical manner. We…

Machine Learning · Computer Science 2019-01-16 James Townsend , Tom Bird , David Barber

Learning to Localize Through Compressed Binary Maps

One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps. In this paper we propose to learn to compress the map representation such that it is optimal for the…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Xinkai Wei , Ioan Andrei Bârsan , Shenlong Wang , Julieta Martinez , Raquel Urtasun

Optimal Lempel-Ziv based lossy compression for memoryless data: how to make the right mistakes

Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is…

Information Theory · Computer Science 2012-10-19 Narayana Santhanam , Dharmendra Modha

Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters

While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an…

Machine Learning · Statistics 2018-10-02 Marton Havasi , Robert Peharz , José Miguel Hernández-Lobato