Related papers: New Algorithms and Lower Bounds for Sequential-Acc…

On the Value of Multiple Read/Write Streams for Data Compression

We study whether, when restricted to using polylogarithmic memory and polylogarithmic passes, we can achieve qualitatively better data compression with multiple read/write streams than we can with only one. We first show how we can achieve…

Data Structures and Algorithms · Computer Science 2012-04-06 Travis Gagie

Data compression and learning in time sequences analysis

Motivated by the problem of the definition of a distance between two sequences of characters, we investigate the so-called learning process of typical sequential data compression schemes. We focus on the problem of how a compression…

Statistical Mechanics · Physics 2009-11-07 A. Puglisi , D. Benedetto , E. Caglioti , V. Loreto , A. Vulpiani

Cryptographic Compression

We introduce a protocol called ENCORE which simultaneously compresses and encrypts data in a one-pass process that can be implemented efficiently and possesses a number of desirable features as a streaming encoder/decoder. Motivated by the…

Cryptography and Security · Computer Science 2025-01-28 Joshua Cooper , Grant Fickes

Data Compression with Relative Entropy Coding

Over the last few years, machine learning unlocked previously infeasible features for compression, such as providing guarantees for users' privacy or tailoring compression to specific data statistics (e.g., satellite images or audio…

Information Theory · Computer Science 2026-03-25 Gergely Flamich

Entropy bounds for grammar compression

Grammar compression represents a string as a context free grammar. Achieving compression requires encoding such grammar as a binary string; there are a few commonly used encodings. We bound the size of practically used encodings for several…

Data Structures and Algorithms · Computer Science 2020-05-21 Michał Gańczorz

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

Compression of data streams down to their information content

According to Kolmogorov complexity, every finite binary string is compressible to a shortest code -- its information content -- from which it is effectively recoverable. We investigate the extent to which this holds for infinite binary…

Information Theory · Computer Science 2019-01-23 George Barmpalias , Andrew Lewis-Pye

A Compression Algorithm Using Mis-aligned Side-information

We study the problem of compressing a source sequence in the presence of side-information that is related to the source via insertions, deletions and substitutions. We propose a simple algorithm to compress the source sequence when the…

Information Theory · Computer Science 2016-11-15 Nan Ma , Kannan Ramchandran , David Tse

Access Pattern-Based Code Compression for Memory-Constrained Embedded Systems

As compared to a large spectrum of performance optimizations, relatively little effort has been dedicated to optimize other aspects of embedded applications such as memory space requirements, power, real-time predictability, and…

Other Computer Science · Computer Science 2011-11-09 O. Ozturk , H. Saputra , M. Kandemir , I. Kolcu

CRAM: Compressed Random Access Memory

We present a new data structure called the \emph{Compressed Random Access Memory} (CRAM) that can store a dynamic string $T$ of characters, e.g., representing the memory of a computer, in compressed form while achieving asymptotically…

Data Structures and Algorithms · Computer Science 2015-03-17 Jesper Jansson , Kunihiko Sadakane , Wing-Kin Sung

Optimal Random Access and Conditional Lower Bounds for 2D Compressed Strings

Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…

Data Structures and Algorithms · Computer Science 2025-10-23 Rajat De , Dominik Kempa

Once-for-All Sequence Compression for Self-Supervised Speech Models

The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models.…

Computation and Language · Computer Science 2023-05-10 Hsuan-Jui Chen , Yen Meng , Hung-yi Lee

Practical Random Access to SLP-Compressed Texts

Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as…

Data Structures and Algorithms · Computer Science 2020-07-21 Travis Gagie , Tomohiro I , Giovanni Manzini , Gonzalo Navarro , Hiroshi Sakamoto , Louisa Seelbach Benkner , Yoshimasa Takabatake

An Introduction to Neural Data Compression

Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression…

Machine Learning · Computer Science 2023-08-22 Yibo Yang , Stephan Mandt , Lucas Theis

Random Access to Grammar Compressed Strings

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel…

Data Structures and Algorithms · Computer Science 2013-10-30 Philip Bille , Gad M. Landau , Rajeev Raman , Kunihiko Sadakane , Srinivasa Rao Satti , Oren Weimann

Lossless Compression of Time Series Data: A Comparative Study

Our increasingly digital and connected world has led to the generation of unprecedented amounts of data. This data must be efficiently managed, transmitted, and stored to preserve resources and allow scalability. Data compression has…

Information Theory · Computer Science 2025-10-09 Jonas G. Matt , Pengcheng Huang , Balz Maag

Unidirectional Input/Output Streaming Complexity of Reversal and Sorting

We consider unidirectional data streams with restricted access, such as read-only and write-only streams. For read-write streams, we also introduce a new complexity measure called expansion, the ratio between the space used on the stream…

Data Structures and Algorithms · Computer Science 2014-07-21 Nathanaël François , Rahul Jain , Frederic Magniez

Large Alphabet Source Coding using Independent Component Analysis

Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications such as compression of natural language text, speech and images. The classic perception of most commonly used methods is that a…

Information Theory · Computer Science 2016-07-26 Amichai Painsky , Saharon Rosset , Meir Feder

On Finite Memory Universal Data Compression and Classification of Individual Sequences

Consider the case where consecutive blocks of N letters of a semi-infinite individual sequence X over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a-priori information about X is available at…

Information Theory · Computer Science 2013-01-25 Jacob Ziv

Domain Specific Hierarchical Huffman Encoding

In this paper, we revisit the classical data compression problem for domain specific texts. It is well-known that classical Huffman algorithm is optimal with respect to prefix encoding and the compression is done at character level. Since…

Information Theory · Computer Science 2013-07-04 K. Ilambharathi , G. S. N. V. Venkata Manik , N. Sadagopan , B. Sivaselvan