Related papers: String Attractors: Verification and Optimization

Online String Attractors

In today's data-centric world, fast and effective compression of data is paramount. To measure success towards the second goal, Kempa and Prezza [STOC2018] introduce the string attractor, a combinatorial object unifying dictionary-based…

Data Structures and Algorithms · Computer Science 2024-07-23 Philip Whittington

String Attractors

Let $S$ be a string of length $n$. In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions $[1,n]$ such that every distinct substring of $S$ has an occurrence crossing one of the attractor's…

Data Structures and Algorithms · Computer Science 2017-09-20 Nicola Prezza

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

The 2-Attractor Problem is NP-Complete

A $k$-attractor is a combinatorial object unifying dictionary-based compression. It allows to compare the repetitiveness measures of different dictionary compressors such as Lempel-Ziv 77, the Burrows-Wheeler transform, straight line…

Computational Complexity · Computer Science 2024-02-08 Janosch Fuchs , Philip Whittington

String Attractors and Combinatorics on Words

The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word…

Data Structures and Algorithms · Computer Science 2019-07-11 Sabrina Mantaci , Antonio Restivo , Giuseppe Romana , Giovanna Rosone , Marinella Sciortino

String Attractors and Infinite Words

The notion of string attractor has been introduced in [Kempa and Prezza, 2018] in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be "attracted". The smallest size…

Formal Languages and Automata Theory · Computer Science 2022-06-02 Antonio Restivo , Giuseppe Romana , Marinella Sciortino

Optimal-Time Dictionary-Compressed Indexes

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…

Data Structures and Algorithms · Computer Science 2019-09-06 Anders Roy Christiansen , Mikko Berggren Ettienne , Tomasz Kociumaka , Gonzalo Navarro , Nicola Prezza

Substring Complexity in Sublinear Space

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of…

Data Structures and Algorithms · Computer Science 2023-11-16 Giulia Bernardini , Gabriele Fici , Paweł Gawrychowski , Solon P. Pissis

Checking and producing word attractors

The article focuses on word (or string) attractors, which are sets of positions related to the text compression efficiency of the underlying word. The article presents two combinatorial algorithms based on Suffix automata or Directed…

Data Structures and Algorithms · Computer Science 2025-09-11 Marie-Pierre Béal , Maxime Crochemore , Giuseppe Romana

Optimal Time Random Access to Grammar-Compressed Strings in Small Space

The random access problem for compressed strings is to build a data structure that efficiently supports accessing the character in position $i$ of a string given in compressed form. Given a grammar of size $n$ compressing a string of size…

Data Structures and Algorithms · Computer Science 2015-01-27 Patrick Hagge Cording

Optimal Rank and Select Queries on Dictionary-Compressed Text

We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $\gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal…

Data Structures and Algorithms · Computer Science 2018-12-24 Nicola Prezza

The Smallest String Attractors of Fibonacci and Period-Doubling Words

A string attractor of a string $T[1..|T|]$ is a set of positions $\Gamma$ of $T$ such that any substring $w$ of $T$ has an occurrence that crosses a position in $\Gamma$, i.e., there is a position $i$ such that $w = T[i..i+|w|-1]$ and the…

Combinatorics · Mathematics 2026-02-19 Mutsunori Banbara , Hideo Bannai , Peaker Guo , Dominik Köppl , Takuya Mieno , Yoshio Okamoto

String attractors of some simple-Parry automatic sequences

Firstly studied by Kempa and Prezza in 2018 as the cement of text compression algorithms, string attractors have become a compelling object of theoretical research within the community of combinatorics on words. In this context, they have…

Combinatorics · Mathematics 2024-03-25 France Gheeraert , Giuseppe Romana , Manon Stipulanti

Adaptive Learning of Compressible Strings

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm…

Data Structures and Algorithms · Computer Science 2021-10-20 Gabriele Fici , Nicola Prezza , Rossano Venturini

Near-Optimal Property Testers for Pattern Matching

The classic exact pattern matching problem, given two strings -- a pattern $P$ of length $m$ and a text $T$ of length $n$ -- asks whether $P$ occurs as a substring of $T$. A property tester for the problem needs to distinguish (with high…

Data Structures and Algorithms · Computer Science 2025-10-21 Ce Jin , Tomasz Kociumaka

Fast Searching in Packed Strings

Given strings $P$ and $Q$ the (exact) string matching problem is to find all positions of substrings in $Q$ matching $P$. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time…

Data Structures and Algorithms · Computer Science 2010-09-08 Philip Bille

Faster Approximate Pattern Matching: A Unified Approach

Approximate pattern matching is a natural and well-studied problem on strings: Given a text $T$, a pattern $P$, and a threshold $k$, find (the starting positions of) all substrings of $T$ that are at distance at most $k$ from $P$. We…

Data Structures and Algorithms · Computer Science 2020-11-17 Panagiotis Charalampopoulos , Tomasz Kociumaka , Philip Wellnitz

Optimal Top-k Document Retrieval

Let $\mathcal{D}$ be a collection of $D$ documents, which are strings over an alphabet of size $\sigma$, of total length $n$. We describe a data structure that uses linear space and and reports $k$ most relevant documents that contain a…

Data Structures and Algorithms · Computer Science 2013-08-02 Gonzalo Navarro , Yakov Nekrich

Can You Solve Closest String Faster than Exhaustive Search?

We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq \Sigma^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest…

Computational Complexity · Computer Science 2023-05-30 Amir Abboud , Nick Fischer , Elazar Goldenberg , Karthik C. S. , Ron Safier

Faster Approximate Pattern Matching in Compressed Repetitive Texts

Motivated by the imminent growth of massive, highly redundant genomic databases, we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the…

Data Structures and Algorithms · Computer Science 2012-11-01 Travis Gagie , Paweł Gawrychowski , Christopher Hoobin , Simon J. Puglisi