English
Related papers

Related papers: Substring Complexity in Sublinear Space

200 papers

Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no such clear measure exists for the compressibility of repetitive sequences. Since statistical entropy does not capture repetitiveness, ad-hoc measures…

Data Structures and Algorithms · Computer Science 2021-01-18 Tomasz Kociumaka , Gonzalo Navarro , Nicola Prezza

Let $S_{T}(k)$ denote the set of distinct substrings of length $k$ in a string $T$, then the $k$-th substring complexity is defined by its cardinality $|S_{T}(k)|$. Recently, $\delta = \max \{ |S_{T}(k)| / k : k \ge 1 \}$ is shown to be a…

Data Structures and Algorithms · Computer Science 2022-05-26 Akiyoshi Kawamoto , Tomohiro I

The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction…

Data Structures and Algorithms · Computer Science 2025-05-19 Lorenzo Carfagna , Giovanni Manzini , Giuseppe Romana , Marinella Sciortino , Cristian Urbina

The sensitivity of a string compression algorithm $C$ asks how much the output size $C(T)$ for an input string $T$ can increase when a single character edit operation is performed on $T$. This notion enables one to measure the robustness of…

Data Structures and Algorithms · Computer Science 2023-02-10 Tooru Akagi , Mitsuru Funakoshi , Shunsuke Inenaga

Let $S$ be a string of length $n$. In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions $[1,n]$ such that every distinct substring of $S$ has an occurrence crossing one of the attractor's…

Data Structures and Algorithms · Computer Science 2017-09-20 Nicola Prezza

We show that the size $\gamma(t_n)$ of the smallest string attractor of the $n$th Thue-Morse word $t_n$ is 4 for any $n\geq 4$, disproving the conjecture by Mantaci et al. [ICTCS 2019] that it is $n$. We also show that $\delta(t_n) =…

Data Structures and Algorithms · Computer Science 2020-08-13 Kanaru Kutsukake , Takuya Matsumoto , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

Computing the {\em matching statistics} of a string $P[1..m]$ with respect to a text $T[1..n]$ is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching…

Data Structures and Algorithms · Computer Science 2022-01-14 Younan Gao

Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. However, when the data are structured in a non-linear way, like in the context of two-dimensional…

Data Structures and Algorithms · Computer Science 2024-04-11 Giuseppe Romana , Marinella Sciortino , Cristian Urbina

The notion of string attractor has been introduced in [Kempa and Prezza, 2018] in the context of Data Compression and it represents a set of positions of a finite word in which all of its factors can be "attracted". The smallest size…

Formal Languages and Automata Theory · Computer Science 2022-06-02 Antonio Restivo , Giuseppe Romana , Marinella Sciortino

String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set $\Gamma\subseteq [1..n]$ is a $k$-attractor for a string $S\in[1..\sigma]^n$ if and…

Data Structures and Algorithms · Computer Science 2020-12-09 Dominik Kempa , Alberto Policriti , Nicola Prezza , Eva Rotenberg

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…

Data Structures and Algorithms · Computer Science 2019-09-06 Anders Roy Christiansen , Mikko Berggren Ettienne , Tomasz Kociumaka , Gonzalo Navarro , Nicola Prezza

In this work, we study the limits of compressed data structures, i.e., structures that support various queries on an input text $T\in\Sigma^n$ using space proportional to the size of $T$ in compressed form. Nearly all fundamental queries…

Data Structures and Algorithms · Computer Science 2025-10-23 Dominik Kempa , Tomasz Kociumaka

Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…

Data Structures and Algorithms · Computer Science 2026-04-15 Dmitry Kosolobov

The size $b$ of the smallest bidirectional macro scheme, which is arguably the most general copy-paste scheme to generate a given sequence, is considered to be the strictest reachable measure of repetitiveness. It is strictly lower-bounded…

Data Structures and Algorithms · Computer Science 2021-05-31 Gonzalo Navarro , Cristian Urbina

The normalized substring complexity $\delta$ of a string is defined as $\max_k \{c[k]/k\}$, where $c[k]$ is the number of \textit{distinct} substrings of length $k$. This simply defined measure has recently attracted attention due to its…

Data Structures and Algorithms · Computer Science 2026-02-17 Gregory Kucherov , Yakov Nekrich

In today's data-centric world, fast and effective compression of data is paramount. To measure success towards the second goal, Kempa and Prezza [STOC2018] introduce the string attractor, a combinatorial object unifying dictionary-based…

Data Structures and Algorithms · Computer Science 2024-07-23 Philip Whittington

We initiate the study of sub-linear sketching and streaming techniques for estimating the output size of common dictionary compressors such as Lempel-Ziv '77, the run-length Burrows-Wheeler transform, and grammar compression. To this end,…

Data Structures and Algorithms · Computer Science 2024-08-20 Ruben Becker , Matteo Canton , Davide Cenzato , Sung-Hwan Kim , Bojana Kodric , Nicola Prezza

In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the $\gamma$ measure defined in terms of the smallest string attractor, and the $\delta$ measure defined in terms of the…

Data Structures and Algorithms · Computer Science 2024-05-21 Lorenzo Carfagna , Giovanni Manzini

We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE)…

Data Structures and Algorithms · Computer Science 2007-06-11 Sofya Raskhodnikova , Dana Ron , Ronitt Rubinfeld , Adam Smith
‹ Prev 1 2 3 10 Next ›