Related papers: String Attractors
A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…
The notion of \emph{string attractor} has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word…
String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set $\Gamma\subseteq [1..n]$ is a $k$-attractor for a string $S\in[1..\sigma]^n$ if and…
The notion of string attractor has been introduced in [Kempa and Prezza, 2018] in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be "attracted". The smallest size…
In today's data-centric world, fast and effective compression of data is paramount. To measure success towards the second goal, Kempa and Prezza [STOC2018] introduce the string attractor, a combinatorial object unifying dictionary-based…
We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…
Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of…
The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction…
We show that it is decidable, given an automatic sequence $\bf s$ and a constant $c$, whether all prefixes of $\bf s$ have a string attractor of size $\leq c$. Using a decision procedure based on this result, we show that all prefixes of…
String attractors are a combinatorial tool coming from the field of data compression. It is a set of positions within a word which intersects an occurrence of every factor. While one-sided infinite words admitting a finite string attractor…
A string attractor of a string $T[1..|T|]$ is a set of positions $\Gamma$ of $T$ such that any substring $w$ of $T$ has an occurrence that crosses a position in $\Gamma$, i.e., there is a position $i$ such that $w = T[i..i+|w|-1]$ and the…
We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $\gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal…
The sensitivity of a string compression algorithm $C$ asks how much the output size $C(T)$ for an input string $T$ can increase when a single character edit operation is performed on $T$. This notion enables one to measure the robustness of…
Firstly studied by Kempa and Prezza in 2018 as the cement of text compression algorithms, string attractors have become a compelling object of theoretical research within the community of combinatorics on words. In this context, they have…
The article focuses on word (or string) attractors, which are sets of positions related to the text compression efficiency of the underlying word. The article presents two combinatorial algorithms based on Suffix automata or Directed…
Computing the {\em matching statistics} of a string $P[1..m]$ with respect to a text $T[1..n]$ is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching…
The rise of repetitive datasets has lately generated a lot of interest in compressed self-indexes based on dictionary compression, a rich and heterogeneous family that exploits text repetitions in different ways. For each such compression…
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
A $k$-attractor is a combinatorial object unifying dictionary-based compression. It allows to compare the repetitiveness measures of different dictionary compressors such as Lempel-Ziv 77, the Burrows-Wheeler transform, straight line…
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no such clear measure exists for the compressibility of repetitive sequences. Since statistical entropy does not capture repetitiveness, ad-hoc measures…