Related papers: Faster Wavelet Tree Queries
Rank and select queries are basic operations on sequences, with applications in compressed text indexes and other space-efficient data structures. One of the standard data structures supporting these queries is the wavelet tree. In this…
Wavelet trees are widely used in the representation of sequences, permutations, text collections, binary relations, discrete points, and other succinct data structures. We show, however, that this still falls short of exploiting all of the…
The wavelet tree (Grossi et al. [SODA, 2003]) and wavelet matrix (Claude et al. [Inf. Syst., 47:15--32, 2015]) are compact indices for texts over an alphabet $[0,\sigma)$ that support rank, select and access queries in $O(\lg \sigma)$ time.…
Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet…
Bit vectors are fundamental building blocks of many succinct data structures. They can be used to represent graphs, are an important part of many text indices in the form of the wavelet tree, and can be used to encode ordered sequences of…
We consider succinct data structures for representing a set of $n$ horizontal line segments in the plane given in rank space to support \emph{segment access}, \emph{segment selection}, and \emph{segment rank} queries. A segment access query…
Ranked document retrieval is a fundamental task in search engines. Such queries are solved with inverted indexes that require additional 45%-80% of the compressed text space, and take tens to hundreds of microseconds per query. In this…
We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient {\em range quantile queries}. A range quantile query takes a rank and the endpoints of a sublist and returns the number with…
An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of…
Rank and select data structures seek to preprocess a bit vector to quickly answer two kinds of queries: rank(i) gives the number of 1 bits in slots 0 through i, and select(j) gives the first slot s with rank(s) = j. A succinct data…
We present an improved wavelet tree construction algorithm and discuss its applications to a number of rank/select problems for integer keys and strings. Given a string of length n over an alphabet of size $\sigma\leq n$, our method builds…
Suffix trees are a fundamental data structure in stringology, but their space usage, though linear, is an important problem for its applications. We design and implement a new compressed suffix tree targeted to highly repetitive texts, such…
It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution.…
This paper studies the performances of BERT combined with tree structure in short sentence ranking task. In retrieval-based question answering system, we retrieve the most similar question of the query question by ranking all the questions…
This work presents a discovery to advance the wisdom in a particular Succinct Data Structure: Wavelet Tree (Grossi, Gupta, and Vitter 2003). The discovery is first made by showing the feasibility of Reversed Indexes = Values: for integers…
One of the central problems in the design of compressed data structures is the efficient support for rank and select queries on bitvectors. These two operations form the backbone of more complex data structures (such as wavelet trees) used…
The rank problem in succinct data structures asks to preprocess an array A[1..n] of bits into a data structure using as close to n bits as possible, and answer queries of the form rank(k) = Sum_{i=1}^k A[i]. The problem has been intensely…
Succinct data structures give space-efficient representations of large amounts of data without sacrificing performance. They rely one cleverly designed data representations and algorithms. We present here the formalization in Coq/SSReflect…
In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often…
A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can efficiently be implemented using a compressed…