Related papers: Substring Complexities on Run-length Compressed St…
Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of…
The normalized substring complexity $\delta$ of a string is defined as $\max_k \{c[k]/k\}$, where $c[k]$ is the number of \textit{distinct} substrings of length $k$. This simply defined measure has recently attracted attention due to its…
Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…
In this work, we study the limits of compressed data structures, i.e., structures that support various queries on an input text $T\in\Sigma^n$ using space proportional to the size of $T$ in compressed form. Nearly all fundamental queries…
The random access problem for compressed strings is to build a data structure that efficiently supports accessing the character in position $i$ of a string given in compressed form. Given a grammar of size $n$ compressing a string of size…
Given a string of length $n$ that is composed of $r$ runs of letters from the alphabet $\{0,1,\ldots,\sigma{-}1\}$ such that $2 \le \sigma \le r$, we describe a data structure that, provided $r \le n / \log^{\omega(1)} n$, stores the string…
In the Shortest Superstring problem we are given a set of strings $S=\{s_1, \ldots, s_n\}$ and integer $\ell$ and the question is to decide whether there is a superstring $s$ of length at most $\ell$ containing all strings of $S$ as…
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no such clear measure exists for the compressibility of repetitive sequences. Since statistical entropy does not capture repetitiveness, ad-hoc measures…
Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm…
In the classic longest common substring (LCS) problem, we are given two strings $S$ and $T$, each of length at most $n$, over an alphabet of size $\sigma$, and we are asked to find a longest string occurring as a fragment of both $S$ and…
We study structure of pure morphic and morphic sequences and prove the following result: the subword complexity of arbitrary morphic sequence is either $\Theta(n^{1+1/k})$ for some $k\in\mathbb N$, or is $O(n \log n)$.
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
Real-world data often comes in compressed form. Analyzing compressed data directly (without decompressing it) can save space and time by orders of magnitude. In this work, we focus on fundamental sequence comparison problems and try to…
We study the following substring suffix selection problem: given a substring of a string T of length n, compute its k-th lexicographically smallest suffix. This a natural generalization of the well-known question of computing the maximal…
For $0<\delta <1$ a $\delta$-subrepetition in a word is a factor which exponent is less than~2 but is not less than $1+\delta$ (the exponent of the factor is the ratio of the factor length to its minimal period). The $\delta$-subrepetition…
In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once.…
String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set $\Gamma\subseteq [1..n]$ is a $k$-attractor for a string $S\in[1..\sigma]^n$ if and…
Two recent lower bounds on the compressibility of repetitive sequences, $\delta \le \gamma$, have received much attention. It has been shown that a length-$n$ string $S$ over an alphabet of size $\sigma$ can be represented within the…
The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction…
Repeat finding in strings has important applications in subfields such as computational biology. Surprisingly, all prior work on repeat finding did not consider the constraint on the locality of repeats. In this paper, we propose and study…