English
Related papers

Related papers: Polylog space compression, pushdown compression, a…

200 papers

The pressing need for eficient compression schemes for XML documents has recently been focused on stack computation [6, 9], and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal…

Information Theory · Computer Science 2007-09-17 Pilar Albert , Elvira Mayordomo , Philippe Moser , Sylvain Perifel

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well…

Data Structures and Algorithms · Computer Science 2021-04-23 Patrick Dinklage , Johannes Fischer , Dominik Köppl , Marvin Löbel , Kunihiko Sadakane

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE)…

Data Structures and Algorithms · Computer Science 2007-06-11 Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith

Lempel-Ziv-Double (LZD) is a variation of the LZ78 compression scheme that achieves better compression on repetitive datasets. Nevertheless, prior research has identified computational inefficiencies and a weakness in its compressibility…

Data Structures and Algorithms · Computer Science 2025-05-05 Linus Götz , Dominik Köppl

In this paper we introduce a variant of pushdown dimension called bounded pushdown (BPD) dimension, that measures the density of information contained in a sequence, relative to a BPD automata, i.e. a finite state machine equipped with an…

Computational Complexity · Computer Science 2007-07-13 Pilar Albert , Elvira Mayordomo , Philippe Moser

One of the most famous and investigated lossless data-compression scheme is the one introduced by Lempel and Ziv about 40 years ago. This compression scheme is known as "dictionary-based compression" and consists of squeezing an input…

Data Structures and Algorithms · Computer Science 2008-02-07 Paolo Ferragina , Igor Nitto , Rossano Venturini

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…

Information Retrieval · Computer Science 2016-05-25 Francisco Claude , Antonio Fariña , Miguel A. Martínez-Prieto , Gonzalo Navarro

This paper presents conditional versions of Lempel-Ziv (LZ) algorithm for settings where compressor and decompressor have access to the same side information. We propose a fixed-length-parsing LZ algorithm with side information, motivated…

Information Theory · Computer Science 2017-07-19 Yeohee Im , Sergio Verdú

The compression-complexity trade-off of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of Gupta-Verd\'{u}-Weissman (GVW) and their underlying…

Information Theory · Computer Science 2009-04-23 Chris Gioran , Ioannis Kontoyiannis

Compression techniques that support fast random access are a core component of any information system. Current state-of-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive…

Data Structures and Algorithms · Computer Science 2015-03-19 Christopher Hoobin , Simon J. Puglisi , Justin Zobel

Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques -- such as deduplication and…

Databases · Computer Science 2025-11-11 Zirui Wang , Tingfeng Lan , Zhaoyuan Su , Juncheng Yang , Yue Cheng

Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the…

Data Structures and Algorithms · Computer Science 2020-05-12 Dmitry Kosolobov , Daniel Valenzuela , Gonzalo Navarro , Simon J. Puglisi

We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often…

Data Structures and Algorithms · Computer Science 2013-05-06 Julian Arz , Johannes Fischer

As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless…

Data Structures and Algorithms · Computer Science 2022-03-31 Paolo Ferragina , Travis Gagie , Dominik Köppl , Giovanni Manzini , Gonzalo Navarro , Manuel Striani , Francesco Tosoni

Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is…

Information Theory · Computer Science 2012-10-19 Narayana Santhanam , Dharmendra Modha

This article gives a self-contained analysis of the performance of the Lempel-Ziv compression algorithm on (hidden) Markovian sources. Specifically we include a full proof of the assertion that the compression rate approaches the entropy…

Information Theory · Computer Science 2019-10-03 Madhu Sudan , David Xiang

This paper delves into recent hardware implementations of the Lempel-Ziv 4 (LZ4) algorithm, highlighting two key factors that limit the throughput of single-kernel compressors. Firstly, the actual parallelism exhibited in single-kernel…

Hardware Architecture · Computer Science 2024-09-20 Tao Chen , Suwen Song , Zhongfeng Wang

Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better…

Data Structures and Algorithms · Computer Science 2020-03-06 Luís M. S. Russo , Ana D. Correia , Gonzalo Navarro , Alexandre P. Francisco

The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities…

Information Theory · Computer Science 2016-11-17 Siddharth Jain , R. K. Bansal

The LZ-End parsing [Kreft & Navarro, 2011] of an input string yields compression competitive with the popular Lempel-Ziv 77 scheme, but also allows for efficient random access. Kempa and Kosolobov showed that the parsing can be computed in…

Data Structures and Algorithms · Computer Science 2024-09-18 Patrick Dinklage
‹ Prev 1 2 3 10 Next ›