Related papers: Relative Select
We describe a data structure that stores a string $S$ in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data…
In this article, we show how to transform a colored de Bruijn graph (dBG) into a practical index for processing massive sets of sequencing reads. Similar to previous works, we encode an instance of a colored dBG of the set using BOSS and a…
We introduce the notion of de Bruijn entropy of an Eulerian quiver and show how the corresponding relative entropy can be applied to practical string similarity problems. This approach explicitly links the combinatorial and…
The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…
Let G be a graph with a list assignment L. Suppose a preferred color is given for some of the vertices; how many of these preferences can be respected when L-coloring G? We explore several natural questions arising in this context, and…
A 'degenerate string' is a sequence of subsets of some alphabet; it represents any string obtainable by selecting one character from each set from left to right. Recently, Alanko et al. generalized the rank-select problem to degenerate…
We present a space- and time-efficient fully dynamic implementation de Bruijn graphs, which can also support fixed-length jumbled pattern matching.
Pangenomes serve as a framework for joint analysis of genomes of related organisms. Several pangenome models were proposed, offering different functionalities, applications provided by available tools, their efficiency etc. Among them, two…
Let P be a set of n points and each of the points is colored with one of the k possible colors. We present efficient algorithms to pre-process P such that for a given query point q, we can quickly identify the smallest color spanning object…
In this paper, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string…
Relevance Models are well-known retrieval models and capable of producing competitive results. However, because they use query expansion they can be very slow. We address this slowness by incorporating two variants of locality sensitive…
Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet…
In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining…
We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a…
A graph is $\ell$-choosable if, for any choice of lists of $\ell$ colors for each vertex, there is a list coloring, which is a coloring where each vertex receives a color from its list. We study complexity issues of choosability of graphs…
We prove that sparse string graphs in a fixed surface have linear expansion. We extend this result to the more general setting of sparse region intersection graphs over any proper minor-closed class. The proofs are combinatorial and…
De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well…
The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which…
Given a static reference string $R$ and a source string $S$, a relative compression of $S$ with respect to $R$ is an encoding of $S$ as a sequence of references to substrings of $R$. Relative compression schemes are a classic model of…