Related papers: String Attractors

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

String Attractors and Combinatorics on Words

The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word…

Data Structures and Algorithms · Computer Science 2019-07-11 Sabrina Mantaci , Antonio Restivo , Giuseppe Romana , Giovanna Rosone , Marinella Sciortino

String Attractors: Verification and Optimization

String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set $\Gamma\subseteq [1..n]$ is a $k$-attractor for a string $S\in[1..\sigma]^n$ if and…

Data Structures and Algorithms · Computer Science 2020-12-09 Dominik Kempa , Alberto Policriti , Nicola Prezza , Eva Rotenberg

String Attractors and Infinite Words

The notion of string attractor has been introduced in [Kempa and Prezza, 2018] in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be "attracted". The smallest size…

Formal Languages and Automata Theory · Computer Science 2022-06-02 Antonio Restivo , Giuseppe Romana , Marinella Sciortino

Online String Attractors

In today's data-centric world, fast and effective compression of data is paramount. To measure success towards the second goal, Kempa and Prezza [STOC2018] introduce the string attractor, a combinatorial object unifying dictionary-based…

Data Structures and Algorithms · Computer Science 2024-07-23 Philip Whittington

Optimal-Time Dictionary-Compressed Indexes

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…

Data Structures and Algorithms · Computer Science 2019-09-06 Anders Roy Christiansen , Mikko Berggren Ettienne , Tomasz Kociumaka , Gonzalo Navarro , Nicola Prezza

Substring Complexity in Sublinear Space

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of…

Data Structures and Algorithms · Computer Science 2023-11-16 Giulia Bernardini , Gabriele Fici , Paweł Gawrychowski , Solon P. Pissis

Generalization of Repetitiveness Measures for Two-Dimensional Strings

The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction…

Data Structures and Algorithms · Computer Science 2025-05-19 Lorenzo Carfagna , Giovanni Manzini , Giuseppe Romana , Marinella Sciortino , Cristian Urbina

String Attractors for Automatic Sequences

We show that it is decidable, given an automatic sequence $\bf s$ and a constant $c$, whether all prefixes of $\bf s$ have a string attractor of size $\leq c$. Using a decision procedure based on this result, we show that all prefixes of…

Formal Languages and Automata Theory · Computer Science 2024-05-31 Luke Schaeffer , Jeffrey Shallit

String attractors and bi-infinite words

String attractors are a combinatorial tool coming from the field of data compression. It is a set of positions within a word which intersects an occurrence of every factor. While one-sided infinite words admitting a finite string attractor…

Combinatorics · Mathematics 2024-03-21 Pierre Béaur , France Gheeraert , Benjamin Hellouin de Menibus

The Smallest String Attractors of Fibonacci and Period-Doubling Words

A string attractor of a string $T[1..|T|]$ is a set of positions $\Gamma$ of $T$ such that any substring $w$ of $T$ has an occurrence that crosses a position in $\Gamma$, i.e., there is a position $i$ such that $w = T[i..i+|w|-1]$ and the…

Combinatorics · Mathematics 2026-02-19 Mutsunori Banbara , Hideo Bannai , Peaker Guo , Dominik Köppl , Takuya Mieno , Yoshio Okamoto

Optimal Rank and Select Queries on Dictionary-Compressed Text

We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $\gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal…

Data Structures and Algorithms · Computer Science 2018-12-24 Nicola Prezza

Sensitivity of string compressors and repetitiveness measures

The sensitivity of a string compression algorithm $C$ asks how much the output size $C(T)$ for an input string $T$ can increase when a single character edit operation is performed on $T$. This notion enables one to measure the robustness of…

Data Structures and Algorithms · Computer Science 2023-02-10 Tooru Akagi , Mitsuru Funakoshi , Shunsuke Inenaga

String attractors of some simple-Parry automatic sequences

Firstly studied by Kempa and Prezza in 2018 as the cement of text compression algorithms, string attractors have become a compelling object of theoretical research within the community of combinatorics on words. In this context, they have…

Combinatorics · Mathematics 2024-03-25 France Gheeraert , Giuseppe Romana , Manon Stipulanti

Checking and producing word attractors

The article focuses on word (or string) attractors, which are sets of positions related to the text compression efficiency of the underlying word. The article presents two combinatorial algorithms based on Suffix automata or Directed…

Data Structures and Algorithms · Computer Science 2025-09-11 Marie-Pierre Béal , Maxime Crochemore , Giuseppe Romana

Computing Matching Statistics on Repetitive Texts

Computing the {\em matching statistics} of a string $P[1..m]$ with respect to a text $T[1..n]$ is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching…

Data Structures and Algorithms · Computer Science 2022-01-14 Younan Gao

Universal Compressed Text Indexing

The rise of repetitive datasets has lately generated a lot of interest in compressed self-indexes based on dictionary compression, a rich and heterogeneous family that exploits text repetitions in different ways. For each such compression…

Data Structures and Algorithms · Computer Science 2020-12-17 Gonzalo Navarro , Nicola Prezza

Compressed Dictionary Matching on Run-Length Encoded Strings

Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…

Data Structures and Algorithms · Computer Science 2025-09-04 Philip Bille , Inge Li Gørtz , Simon J. Puglisi , Simon R. Tarnow

The 2-Attractor Problem is NP-Complete

A $k$-attractor is a combinatorial object unifying dictionary-based compression. It allows to compare the repetitiveness measures of different dictionary compressors such as Lempel-Ziv 77, the Burrows-Wheeler transform, straight line…

Computational Complexity · Computer Science 2024-02-08 Janosch Fuchs , Philip Whittington

Towards a Definitive Compressibility Measure for Repetitive Sequences

Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no such clear measure exists for the compressibility of repetitive sequences. Since statistical entropy does not capture repetitiveness, ad-hoc measures…

Data Structures and Algorithms · Computer Science 2021-01-18 Tomasz Kociumaka , Gonzalo Navarro , Nicola Prezza