Related papers: Dynamic Suffix Array in Optimal Compressed Space
In the last decades, the necessity to process massive amounts of textual data fueled the development of compressed text indexes: data structures efficiently answering queries on a given text while occupying space proportional to the…
The suffix array $SA[1..n]$ of a text $T$ of length $n$ is a permutation of $\{1,\ldots,n\}$ describing the lexicographical ordering of suffixes of $T$, and it is considered to be among of the most important data structures in string…
The Suffix Array $SA_S[1\ldots n]$ of an $n$-length string $S$ is a lexicographically sorted array of the suffixes of $S$. The suffix array is one of the most well known and widely used data structures in string algorithms. We present a…
In this work, we study the limits of compressed data structures, i.e., structures that support various queries on an input text $T\in\Sigma^n$ using space proportional to the size of $T$ in compressed form. Nearly all fundamental queries…
The suffix array and the suffix tree are the two most fundamental data structures for string processing. For a length-$n$ text, however, they use $\Theta(n \log n)$ bits of space, which is often too costly. To address this, Grossi and…
The Suffix Array SA(S) of a string S[1 ... n] is an array containing all the suffixes of S sorted by lexicographic order. The suffix array is one of the most well known indexing data structures, and it functions as a key tool in many string…
We consider the problem of storing a dynamic string $S$ over an alphabet $\Sigma=\{\,1,\ldots,\sigma\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries:…
A compressed self-index stores a string in compressed form while supporting locate queries without decompression. For highly repetitive strings (arising in web crawls, versioned documents, and genomic collections), static self-indexes can…
Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this…
In the range $\alpha$-majority query problem, we are given a sequence $S[1..n]$ and a fixed threshold $\alpha \in (0, 1)$, and are asked to preprocess $S$ such that, given a query range $[i..j]$, we can efficiently report the symbols that…
Suffix Array (SA) is a cardinal data structure in many pattern matching applications, including data compression, plagiarism detection and sequence alignment. However, as the volumes of data increase abruptly, the construction of SA is not…
The suffix tree is arguably the most fundamental data structure on strings: introduced by Weiner (SWAT 1973) and McCreight (JACM 1976), it allows solving a myriad of computational problems on strings in linear time. Motivated by its large…
We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\log\sigma)$ bits of space, where $n$ is the string length and $\sigma$ is the alphabet size.…
We study the fundamental question of how efficiently suffix array entries can be accessed when the array cannot be stored explicitly. The suffix array $SA_T[1..n]$ of a text $T$ of length $n$ encodes the lexicographic order of its suffixes…
Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…
In this paper, we present a new data structure called the packed compact trie (packed c-trie) which stores a set $S$ of $k$ strings of total length $n$ in $n \log\sigma + O(k \log n)$ bits of space and supports fast pattern matching queries…
Suffix trees are a fundamental data structure in stringology, but their space usage, though linear, is an important problem for its applications. We design and implement a new compressed suffix tree targeted to highly repetitive texts, such…
The well-known dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are the basis of several universal lossless compression techniques. These algorithms are asymmetric regarding encoding/decoding time and memory requirements, with…
A Longest Common Extension (LCE) query on a text $T$ of length $N$ asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding $\mathcal{G}$ of size $w = O(\min(z \log N…
We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix…