Related papers: An External-Memory Algorithm for String Graph Cons…
Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multistring generalization of the Burrows-Wheeler Transform (BWT): large requirements of…
We present a new scalable, lightweight algorithm to incrementally construct the BWT and FM-index of large string sets such as those produced by Next Generation Sequencing. The algorithm is designed for massive parallelism and can…
Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby…
Indexing of very large collections of strings such as those produced by the widespread sequencing technologies, heavily relies on multi-string generalizations of the Burrows-Wheeler Transform (BWT), and for this problem various in-memory…
The Burrows-Wheeler Transform (BWT) serves as the basis for many important sequence indexes. On very large datasets (e.g. genomic databases), classical BWT construction algorithms are often infeasible because they usually need to have the…
We present an algorithm for building the extended BWT (eBWT) of a string collection from its grammar-compressed representation. Our technique exploits the string repetitions captured by the grammar to boost the computation of the eBWT.…
The Burrows Wheeler transform has applications in data compression as well as full text indexing. Despite its important applications and various existing algorithmic approaches the construction of the transform for large data sets is still…
The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more…
The Burrows-Wheeler transform (BWT) is integral to the FM-index, which is used extensively in text compression, indexing, pattern search, and bioinformatic problems as de novo assembly and read alignment. Thus, efficient construction of the…
Due to the exponential growth of genomic data, constructing dedicated data structures has become the principal bottleneck in common bioinformatics applications. In particular, the Burrows-Wheeler Transform (BWT) is the basis of some of the…
We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is…
Introduced about thirty years ago in the field of Data Compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the…
In order to avoid the reference bias introduced by mapping reads to a reference genome, bioinformaticians are investigating reference-free methods for analyzing sequenced genomes. With large projects sequencing thousands of individuals,…
Burrows-Wheeler transform (BWT) is an invertible text transformation that, given a text $T$ of length $n$, permutes its symbols according to the lexicographic order of suffixes of $T$. BWT is one of the most heavily studied algorithms in…
Detecting all the strings that occur in a text more frequently or less frequently than expected according to an IID or a Markov model is a basic problem in string mining, yet current algorithms are based on data structures that are either…
In this paper, we present a number of network-analysis algorithms in the external-memory model. We focus on methods for large naturally sparse graphs, that is, n-vertex graphs that have O(n) edges and are structured so that this sparsity…
We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting…
The boom of genomic sequencing makes compression of set of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the…
The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute…
This thesis presents a number of results related to path traversal in trees and graphs. In particular, we focus on data structures which allow such traversals to be performed efficiently in the external memory setting. In addition, for…