Related papers: Sublinear Algorithms for Approximating String Comp…

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that…

Data Structures and Algorithms · Computer Science 2007-05-23 Philip Bille , Rolf Fagerberg , Inge Li Goertz

Lempel-Ziv-like Parsing in Small Space

Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the…

Data Structures and Algorithms · Computer Science 2020-05-12 Dmitry Kosolobov , Daniel Valenzuela , Gonzalo Navarro , Simon J. Puglisi

Incongruity-sensitive access to highly compressed strings

Random access to highly compressed strings -- represented by straight-line programs or Lempel-Ziv parses, for example -- is a well-studied topic. Random access to such strings in strongly sublogarithmic time is impossible in the worst case,…

Data Structures and Algorithms · Computer Science 2026-02-05 Ferdinando Cicalese , Zsuzsanna Lipták , Travis Gagie , Gonzalo Navarro , Nicola Prezza , Cristian Urbina

Faster subsequence recognition in compressed strings

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to…

Data Structures and Algorithms · Computer Science 2011-11-10 Alexander Tiskin

Hierarchical Relative Lempel-Ziv Compression

Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is…

Data Structures and Algorithms · Computer Science 2022-08-25 Philip Bille , Inge Li Gørtz , Simon J. Puglisi , Simon R. Tarnow

LZ-Compressed String Dictionaries

We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often…

Data Structures and Algorithms · Computer Science 2013-05-06 Julian Arz , Johannes Fischer

Bit-Optimal Lempel-Ziv compression

One of the most famous and investigated lossless data-compression scheme is the one introduced by Lempel and Ziv about 40 years ago. This compression scheme is known as "dictionary-based compression" and consists of squeezing an input…

Data Structures and Algorithms · Computer Science 2008-02-07 Paolo Ferragina , Igor Nitto , Rossano Venturini

RLZAP: Relative Lempel-Ziv with Adaptive Pointers

Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a reference genome is…

Data Structures and Algorithms · Computer Science 2016-05-17 Anthony J. Cox , Andrea Farruggia , Travis Gagie , Simon J. Puglisi , Jouni Sirén

Approximating Optimal Bidirectional Macro Schemes

Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better…

Data Structures and Algorithms · Computer Science 2020-03-06 Luís M. S. Russo , Ana D. Correia , Gonzalo Navarro , Alexandre P. Francisco

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

A simple online competitive adaptation of Lempel-Ziv compression with efficient random access support

We present a simple adaptation of the Lempel Ziv 78' (LZ78) compression scheme ({\em IEEE Transactions on Information Theory, 1978}) that supports efficient random access to the input string. Namely, given query access to the compressed…

Data Structures and Algorithms · Computer Science 2013-01-14 Akashnil Dutta , Reut Levi , Dana Ron , Ronitt Rubinfeld

Approximating binary longest common subsequence in almost-linear time

The Longest Common Subsequence (LCS) is a fundamental string similarity measure, and computing the LCS of two strings is a classic algorithms question. A textbook dynamic programming algorithm gives an exact algorithm in quadratic time, and…

Data Structures and Algorithms · Computer Science 2023-02-13 Xiaoyu He , Ray Li

Substring Compression Variations and LZ78-Derivates

We propose algorithms computing the semi-greedy Lempel-Ziv 78 (LZ78), the Lempel-Ziv Double (LZD), and the Lempel-Ziv-Miller-Wegman (LZMW) factorizations in linear time for integer alphabets. For LZD and LZMW, we additionally propose data…

Data Structures and Algorithms · Computer Science 2024-09-24 Dominik Köppl

Computing the LZ-End parsing: Easy to implement and practically efficient

The LZ-End parsing [Kreft & Navarro, 2011] of an input string yields compression competitive with the popular Lempel-Ziv 77 scheme, but also allows for efficient random access. Kempa and Kosolobov showed that the parsing can be computed in…

Data Structures and Algorithms · Computer Science 2024-09-18 Patrick Dinklage

Range Predecessor and Lempel-Ziv Parsing

The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of…

Data Structures and Algorithms · Computer Science 2015-07-28 Djamal Belazzougui , Simon J. Puglisi

Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic

Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression…

Data Structures and Algorithms · Computer Science 2011-04-22 Pawel Gawrychowski

Optimal Lempel-Ziv based lossy compression for memoryless data: how to make the right mistakes

Compression refers to encoding data using bits, so that the representation uses as few bits as possible. Compression could be lossless: i.e. encoded data can be recovered exactly from its representation) or lossy where the data is…

Information Theory · Computer Science 2012-10-19 Narayana Santhanam , Dharmendra Modha

Compression with the tudocomp Framework

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well…

Data Structures and Algorithms · Computer Science 2021-04-23 Patrick Dinklage , Johannes Fischer , Dominik Köppl , Marvin Löbel , Kunihiko Sadakane

Compressibility-Aware Quantum Algorithms on Strings

Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and…

Data Structures and Algorithms · Computer Science 2023-02-15 Daniel Gibney , Sharma V. Thankachan

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

Compression techniques that support fast random access are a core component of any information system. Current state-of-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive…

Data Structures and Algorithms · Computer Science 2015-03-19 Christopher Hoobin , Simon J. Puglisi , Justin Zobel