Related papers: Efficient Fully-Compressed Sequence Representation…
We describe a data structure that supports access, rank and select queries, as well as symbol insertions and deletions, on a string $S[1,n]$ over alphabet $[1..\sigma]$ in time $O(\lg n/\lg\lg n)$, which is optimal even on binary sequences…
We consider the problem of storing a dynamic string $S$ over an alphabet $\Sigma=\{\,1,\ldots,\sigma\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries:…
Let $s$ be a string of length $n$ over an alphabet of constant size $\sigma$ and let $c$ and $\epsilon$ be constants with (1 \geq c \geq 0) and (\epsilon > 0). Using (O (n)) time, (O (n^c)) bits of memory and one pass we can always encode…
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix…
In this paper, we consider the problem of efficiently representing a set $S$ of $n$ items out of a universe $U=\{0,...,u-1\}$ while supporting a number of operations on it. Let $G=g_1...g_n$ be the gap stream associated with $S$, $gap$ its…
Given a string of length $n$ that is composed of $r$ runs of letters from the alphabet $\{0,1,\ldots,\sigma{-}1\}$ such that $2 \le \sigma \le r$, we describe a data structure that, provided $r \le n / \log^{\omega(1)} n$, stores the string…
In the last decades, the necessity to process massive amounts of textual data fueled the development of compressed text indexes: data structures efficiently answering queries on a given text while occupying space proportional to the…
In the range $\alpha$-majority query problem, we are given a sequence $S[1..n]$ and a fixed threshold $\alpha \in (0, 1)$, and are asked to preprocess $S$ such that, given a query range $[i..j]$, we can efficiently report the symbols that…
Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…
We consider the problem of representing, in a compressed format, a bit-vector $S$ of $m$ bits with $n$ 1s, supporting the following operations, where $b \in \{0, 1 \}$: $rank_b(S,i)$ returns the number of occurrences of bit $b$ in the…
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any $n$-node static tree can be represented in $2n + o(n)$ bits and a number of operations on the tree can be supported in…
We present a compressed representation of tries based on top tree compression [ICALP 2013] that works on a standard, comparison-based, pointer machine model of computation and supports efficient prefix search queries. Namely, we show how to…
Given $d$ strings over the alphabet $\{0,1,\ldots,\sigma{-}1\}$, the classical Aho--Corasick data structure allows us to find all $occ$ occurrences of the strings in any text $T$ in $O(|T| + occ)$ time using $O(m\log m)$ bits of space,…
Given a string $S$ of length $N$ on a fixed alphabet of $\sigma$ symbols, a grammar compressor produces a context-free grammar $G$ of size $n$ that generates $S$ and only $S$. In this paper we describe data structures to support the…
Suppose we are asked to preprocess a string \(s [1..n]\) such that later, given a substring's endpoints, we can quickly count how many distinct characters it contains. In this paper we give a data structure for this problem that takes \(n…
Let $A$ be a static array storing $n$ elements from a totally ordered set. We present a data structure of optimal size at most $n\log_2(3+2\sqrt{2})+o(n)$ bits that allows us to answer the following queries on $A$ in constant time, without…
Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive…
The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…
Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. However, bitmaps usually use a large storage space, thus requiring…
This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…