Related papers: Approximating Optimal Bidirectional Macro Schemes

Sublinear Algorithms for Approximating String Compressibility

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE)…

Data Structures and Algorithms · Computer Science 2007-06-11 Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith

Lempel-Ziv Networks

Sequence processing has long been a central area of machine learning research. Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally…

Machine Learning · Computer Science 2022-11-28 Rebecca Saul , Mohammad Mahmudul Alam , John Hurwitz , Edward Raff , Tim Oates , James Holt

Compression with the tudocomp Framework

We present a framework facilitating the implementation and comparison of text compression algorithms. We evaluate its features by a case study on two novel compression algorithms based on the Lempel-Ziv compression schemes that perform well…

Data Structures and Algorithms · Computer Science 2021-04-23 Patrick Dinklage , Johannes Fischer , Dominik Köppl , Marvin Löbel , Kunihiko Sadakane

Bit-Optimal Lempel-Ziv compression

One of the most famous and investigated lossless data-compression scheme is the one introduced by Lempel and Ziv about 40 years ago. This compression scheme is known as "dictionary-based compression" and consists of squeezing an input…

Data Structures and Algorithms · Computer Science 2008-02-07 Paolo Ferragina , Igor Nitto , Rossano Venturini

Lempel-Ziv-like Parsing in Small Space

Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the…

Data Structures and Algorithms · Computer Science 2020-05-12 Dmitry Kosolobov , Daniel Valenzuela , Gonzalo Navarro , Simon J. Puglisi

Hierarchical Relative Lempel-Ziv Compression

Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string $S$ is compressed relative to a second string $R$ (called the reference) by parsing $S$ into a sequence of substrings that occur in $R$. RLZ is…

Data Structures and Algorithms · Computer Science 2022-08-25 Philip Bille , Inge Li Gørtz , Simon J. Puglisi , Simon R. Tarnow

Sequential Recurrence-Based Multidimensional Universal Source Coding of Lempel-Ziv Type

We define an algorithm that parses multidimensional arrays sequentially into mainly unrepeated but nested multidimensional sub-arrays of increasing size, and show that the resulting sub-block pointer encoder compresses almost every…

Information Theory · Computer Science 2014-08-20 Tyll Krueger , Guido Montufar , Ruedi Seiler , Rainer Siegmund-Schultze

LZRR: LZ77 Parsing with Right Reference

Lossless data compression has been widely studied in computer science. One of the most widely used lossless data compressions is Lempel-Zip(LZ) 77 parsing, which achieves a high compression ratio. Bidirectional (a.k.a. macro) parsing is a…

Data Structures and Algorithms · Computer Science 2018-12-12 Takaaki Nishimoto , Yasuo Tabei

Lempel-Ziv Factorization May Be Harder Than Computing All Runs

The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM…

Data Structures and Algorithms · Computer Science 2014-09-22 Dmitry Kosolobov

Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. Bidirectional indices have opened new possibilities in this regard allowing the…

Data Structures and Algorithms · Computer Science 2018-03-06 Kiavash Kianfar , Christopher Pockrandt , Bahman Torkamandi , Haochen Luo , Knut Reinert

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that…

Data Structures and Algorithms · Computer Science 2007-05-23 Philip Bille , Rolf Fagerberg , Inge Li Goertz

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

RLZAP: Relative Lempel-Ziv with Adaptive Pointers

Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a reference genome is…

Data Structures and Algorithms · Computer Science 2016-05-17 Anthony J. Cox , Andrea Farruggia , Travis Gagie , Simon J. Puglisi , Jouni Sirén

Optimal LZ-End Parsing is Hard

LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was…

Data Structures and Algorithms · Computer Science 2023-02-07 Hideo Bannai , Mitsuru Funakoshi , Kazuhiro Kurita , Yuto Nakashima , Kazuhisa Seto , Takeaki Uno

Range Predecessor and Lempel-Ziv Parsing

The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of…

Data Structures and Algorithms · Computer Science 2015-07-28 Djamal Belazzougui , Simon J. Puglisi

Universal Indexes for Highly Repetitive Document Collections

Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that…

Information Retrieval · Computer Science 2016-05-25 Francisco Claude , Antonio Fariña , Miguel A. Martínez-Prieto , Gonzalo Navarro

Near-Optimal Convex Simple Bilevel Optimization with a Bisection Method

This paper studies a class of simple bilevel optimization problems where we minimize a composite convex function at the upper-level subject to a composite convex lower-level problem. Existing methods either provide asymptotic guarantees for…

Optimization and Control · Mathematics 2024-03-06 Jiulin Wang , Xu Shi , Rujun Jiang

On the non-randomness of maximum Lempel Ziv complexity sequences of finite size

Random sequences attain the highest entropy rate. The estimation of entropy rate for an ergodic source can be done using the Lempel Ziv complexity measure yet, the exact entropy rate value is only reached in the infinite limit. We prove…

Chaotic Dynamics · Physics 2013-11-05 E. Estevez-Rams , R. Lora Serrano , B. Aragón Fernández , I. Brito Reyes

On the Approximation Ratio of Ordered Parsings

Shannon's entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is $b$, the least number of phrases of a general bidirectional parse of…

Data Structures and Algorithms · Computer Science 2019-10-29 Gonzalo Navarro , Carlos Ochoa , Nicola Prezza

Simpler and Faster Lempel Ziv Factorization

We present a new, simple, and efficient approach for computing the Lempel-Ziv (LZ77) factorization of a string in linear time, based on suffix arrays. Computational experiments on various data sets show that our approach constantly…

Data Structures and Algorithms · Computer Science 2013-01-21 Keisuke Goto , Hideo Bannai