Related papers: An Adaptive-Parity Error-Resilient LZ'77 Compressi…

Sublinear Algorithms for Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE)…

Data Structures and Algorithms · Computer Science 2007-06-11 Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith

A simple online competitive adaptation of Lempel-Ziv compression with efficient random access support

We present a simple adaptation of the Lempel Ziv 78' (LZ78) compression scheme ({\em IEEE Transactions on Information Theory, 1978}) that supports efficient random access to the input string. Namely, given query access to the compressed…

Data Structures and Algorithms · Computer Science 2013-01-14 Akashnil Dutta , Reut Levi , Dana Ron , Ronitt Rubinfeld

Optimal Universal Lossless Compression with Side Information

This paper presents conditional versions of Lempel-Ziv (LZ) algorithm for settings where compressor and decompressor have access to the same side information. We propose a fixed-length-parsing LZ algorithm with side information, motivated…

Information Theory · Computer Science 2017-07-19 Yeohee Im , Sergio Verdú

Bit-Optimal Lempel-Ziv compression

One of the most famous and investigated lossless data-compression scheme is the one introduced by Lempel and Ziv about 40 years ago. This compression scheme is known as "dictionary-based compression" and consists of squeezing an input…

Data Structures and Algorithms · Computer Science 2008-02-07 Paolo Ferragina , Igor Nitto , Rossano Venturini

Optimal Lempel-Ziv based lossy compression for memoryless data: how to make the right mistakes

Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is…

Information Theory · Computer Science 2012-10-19 Narayana Santhanam , Dharmendra Modha

Analyzing and Leveraging the $k$-Sensitivity of LZ77

We study the sensitivity of the Lempel-Ziv 77 compression algorithm to edits, showing how modifying a string $w$ can deteriorate or improve its compression. Our first result is a tight upper bound for $k$ edits: $\forall w' \in B(w,k)$, we…

Data Structures and Algorithms · Computer Science 2026-02-24 Gabriel Bathie , Paul Huber , Guillaume Lagarde , Akka Zemmari

An Optimal Unequal Error Protection LDPC Coded Recording System

For efficient modulation and error control coding, the deliberate flipping approach imposes the run-length-limited(RLL) constraint by bit error before recording. From the read side, a high coding rate limits the correcting capability of RLL…

Information Theory · Computer Science 2023-10-11 Hong-fu Chou

Space Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets

We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length $N$ in linear time, that utilizes only $N\log N + O(1)$ bits of working space, i.e., a single integer array, for constant size integer…

Data Structures and Algorithms · Computer Science 2013-10-08 Keisuke Goto , Hideo Bannai

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

Compression techniques that support fast random access are a core component of any information system. Current state-of-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive…

Data Structures and Algorithms · Computer Science 2015-03-19 Christopher Hoobin , Simon J. Puglisi , Justin Zobel

Access Time Tradeoffs in Archive Compression

Web archives, query and proxy logs, and so on, can all be very large and highly repetitive; and are accessed only sporadically and partially, rather than continually and holistically. This type of data is ideal for compression-based…

Information Theory · Computer Science 2016-03-01 Matthias Petri , Alistair Moffat , P. C. Nagesh , Anthony Wirth

Range Predecessor and Lempel-Ziv Parsing

The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of…

Data Structures and Algorithms · Computer Science 2015-07-28 Djamal Belazzougui , Simon J. Puglisi

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that…

Data Structures and Algorithms · Computer Science 2007-05-23 Philip Bille , Rolf Fagerberg , Inge Li Goertz

LZD-style Compression Scheme with Truncation and Repetitions

Lempel-Ziv-Double (LZD) is a variation of the LZ78 compression scheme that achieves better compression on repetitive datasets. Nevertheless, prior research has identified computational inefficiencies and a weakness in its compressibility…

Data Structures and Algorithms · Computer Science 2025-05-05 Linus Götz , Dominik Köppl

Lempel-Ziv-like Parsing in Small Space

Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the…

Data Structures and Algorithms · Computer Science 2020-05-12 Dmitry Kosolobov , Daniel Valenzuela , Gonzalo Navarro , Simon J. Puglisi

Compressing the Data Densely by New Geflochtener to Accelerate Web

At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any…

Information Theory · Computer Science 2014-05-20 Hemant Kumar Saini , Satpal Singh Kushwaha , C. Rama Krishna

LZ-Compressed String Dictionaries

We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often…

Data Structures and Algorithms · Computer Science 2013-05-06 Julian Arz , Johannes Fischer

Practical and Effective Re-Pair Compression

Re-Pair is an efficient grammar compressor that operates by recursively replacing high-frequency character pairs with new grammar symbols. The most space-efficient linear-time algorithm computing Re-Pair uses $(1+\epsilon)n+\sqrt n$ words…

Data Structures and Algorithms · Computer Science 2017-04-28 Philip Bille , Inge Li Gørtz , Nicola Prezza

Lossy Compression in Near-Linear Time via Efficient Random Codebooks and Databases

The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-Verd\'{u}-Weissman (GVW) and their underlying…

Information Theory · Computer Science 2009-04-23 Chris Gioran , Ioannis Kontoyiannis

Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in many diverse applications, including data compression, text indexing, and pattern discovery. We describe new linear time LZ factorization…

Data Structures and Algorithms · Computer Science 2020-12-11 Juha Kärkkäinen , Dominik Kempa , Simon J. Puglisi

On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm

The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities…

Information Theory · Computer Science 2016-11-17 Siddharth Jain , R. K. Bansal