English
Related papers

Related papers: Composite repetition-aware data structures

200 papers

Highly-repetitive collections of strings are increasingly being amassed by genome sequencing and genetic variation experiments, as well as by storing all versions of human-generated files, like webpages and source code. Existing indexes for…

Data Structures and Algorithms · Computer Science 2016-04-22 Djamal Belazzougui , Fabio Cunial , Travis Gagie , Nicola Prezza , Mathieu Raffinot

We propose algorithms that, given the input string of length $n$ over integer alphabet of size $\sigma$, construct the Burrows-Wheeler transform (BWT), the permuted longest-common-prefix (PLCP) array, and the LZ77 parsing in…

Data Structures and Algorithms · Computer Science 2020-12-09 Dominik Kempa

In this paper, we propose a novel approach to combine \emph{compact directed acyclic word graphs} (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be…

Data Structures and Algorithms · Computer Science 2017-07-28 Takuya Takagi , Keisuke Goto , Yuta Fujishige , Shunsuke Inenaga , Hiroki Arimura

In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size $e$ for a text $T$ of length $n$ into other text…

Data Structures and Algorithms · Computer Science 2023-08-07 Hiroki Arimura , Shunsuke Inenaga , Yasuaki Kobayashi , Yuto Nakashima , Mizuki Sue

We present an algorithm for building the extended BWT (eBWT) of a string collection from its grammar-compressed representation. Our technique exploits the string repetitions captured by the grammar to boost the computation of the eBWT.…

Genomics · Quantitative Biology 2021-02-10 Diego Diaz-Dominguez annd Gonzalo Navarro

The boom of genomic sequencing makes compression of set of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the…

Data Structures and Algorithms · Computer Science 2021-11-18 Bastien Cazaux , Eric Rivals

The compact directed acyclic word graph (CDAWG) of a string $T$ is an index occupying $O(\mathsf{e})$ space, where $\mathsf{e}$ is the number of right extensions of maximal repeats in $T$. For highly repetitive datasets, the measure…

Data Structures and Algorithms · Computer Science 2024-10-22 Shunsuke Inenaga , Dmitry Kosolobov

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into…

Data Structures and Algorithms · Computer Science 2017-12-18 Andrea Farruggia , Travis Gagie , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is…

Data Structures and Algorithms · Computer Science 2025-02-11 Rikuya Hamai , Hiroto Fujimaru , Shunsuke Inenaga

The compact directed acyclic word graph (CDAWG) is the minimal compact automaton that recognizes all the suffixes of a string. Classically the CDAWG has been implemented as an index of the string it recognizes, requiring $o(n)$ space for a…

Data Structures and Algorithms · Computer Science 2024-07-15 Alan M. Cleary , Joseph Winjum , Jordan Dood , Shunsuke Inenaga

Compressed suffix arrays (CSAs) index large repetitive collections and are key in many text applications. The r-index and its derivatives combine the run-length Burrows-Wheeler Transform (BWT) with suffix array sampling to achieve space…

Data Structures and Algorithms · Computer Science 2026-02-20 Diego Díaz-Domínguez , Veli Mäkinen

Given a string $T$, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with $e_T$ arcs, taking overall $O(e_T+e_{{\overline{T}}})$ words of space, where ${\overline{T}}$ is the reverse…

Data Structures and Algorithms · Computer Science 2017-05-25 Djamal Belazzougui , Fabio Cunial

The run-length compressed Burrows-Wheeler transform (RLBWT) used in conjunction with the backward search introduced in the FM index is the centerpiece of most compressed indexes working on highly-repetitive data sets like biological…

Data Structures and Algorithms · Computer Science 2021-10-05 Jin Jie Deng , Wing-Kai Hon , Dominik Köppl , Kunihiko Sadakane

The Lempel-Ziv factorization (LZ77) and the Run-Length encoded Burrows-Wheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes $z$ and $r$ closely related to the amount of text…

Data Structures and Algorithms · Computer Science 2017-02-07 Alberto Policriti , Nicola Prezza

Computing the {\em matching statistics} of a string $P[1..m]$ with respect to a text $T[1..n]$ is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching…

Data Structures and Algorithms · Computer Science 2022-01-14 Younan Gao

The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an…

Data Structures and Algorithms · Computer Science 2022-02-17 Takaaki Nishimoto , Shunsuke Kanda , Yasuo Tabei

We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is…

Data Structures and Algorithms · Computer Science 2023-08-15 Diego Díaz-Domínguez , Gonzalo Navarro

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms…

Data Structures and Algorithms · Computer Science 2024-04-16 Sara Giuliani , Shunsuke Inenaga , Zsuzsanna Lipták , Nicola Prezza , Marinella Sciortino , Anna Toffanello

Suffix trees are a fundamental data structure in stringology, but their space usage, though linear, is an important problem for its applications. We design and implement a new compressed suffix tree targeted to highly repetitive texts, such…

Data Structures and Algorithms · Computer Science 2019-02-12 Manuel Cáceres , Gonzalo Navarro

The compact directed acyclic word graph (CDAWG) of a string $T$ of length $n$ takes space proportional just to the number $e$ of right extensions of the maximal repeats of $T$, and it is thus an appealing index for highly repetitive…

Data Structures and Algorithms · Computer Science 2017-09-27 Djamal Belazzougui , Fabio Cunial
‹ Prev 1 2 3 10 Next ›