Related papers: A Compressed-Gap Data-Aware Measure
We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..\sigma]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select,…
In the range $\alpha$-majority query problem, we are given a sequence $S[1..n]$ and a fixed threshold $\alpha \in (0, 1)$, and are asked to preprocess $S$ such that, given a query range $[i..j]$, we can efficiently report the symbols that…
We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text $T[1..u]$ that is represented by a (context-free) grammar of…
Given a string $S$ of length $N$ on a fixed alphabet of $\sigma$ symbols, a grammar compressor produces a context-free grammar $G$ of size $n$ that generates $S$ and only $S$. In this paper we describe data structures to support the…
We introduce space- and time-efficient algorithms and data structures for the offline set intersection problem. We show that a sorted integer set $S \subseteq [0{..}u)$ of $n$ elements can be represented using compressed space while…
The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…
The Planar Graph Metric Compression Problem is to compactly encode the distances among $k$ nodes in a planar graph of size $n$. Two na\"ive solutions are to store the graph using $O(n)$ bits, or to explicitly store the distance matrix with…
We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $\gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal…
We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given $n$ points in a $d$-dimensional space where each coordinate is represented using $B$ bits (i.e., $dB$ bits per point), it produces a…
Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…
We consider the problem of storing a dynamic string $S$ over an alphabet $\Sigma=\{\,1,\ldots,\sigma\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries:…
Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel…
In this paper, we study the following problem: given $n$ subsets $S_1, \dots, S_n$ of an integer universe $U = \{0,\dots, u-1\}$, having total cardinality $N = \sum_{i=1}^n |S_i|$, find a prefix-free encoding $enc : U \rightarrow \{0,1\}^+$…
We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…
Let $s$ be a string of length $n$ over an alphabet of constant size $\sigma$ and let $c$ and $\epsilon$ be constants with (1 \geq c \geq 0) and (\epsilon > 0). Using (O (n)) time, (O (n^c)) bits of memory and one pass we can always encode…
Explicit high-order feature interactions efficiently capture essential structural knowledge about the data of interest and have been used for constructing generative models. We present a supervised discriminative High-Order Parametric…
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…
We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing…
In the last decades, the necessity to process massive amounts of textual data fueled the development of compressed text indexes: data structures efficiently answering queries on a given text while occupying space proportional to the…
A Random Access query to a string $T\in [0..\sigma)^n$ asks for the character $T[i]$ at a given position $i\in [0..n)$. In $O(n\log\sigma)$ bits of space, this fundamental task admits constant-time queries. While this is optimal in the…