Related papers: Memory Efficient De Bruijn Graph Construction

SparseAssembler: de novo Assembly with the Sparse de Bruijn Graph

de Bruijn graph-based algorithms are one of the two most widely used approaches for de novo genome assembly. A major limitation of this approach is the large computational memory space requirement to construct the de Bruijn graph, which…

Data Structures and Algorithms · Computer Science 2011-07-11 Chengxi Ye , Zhanshan Sam Ma , Charles H. Cannon , Mihai Pop , Douglas W. Yu

Efficient Parallel and Out of Core Algorithms for Constructing Large Bi-directed de Bruijn Graphs

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories -- based on the data structures which…

Data Structures and Algorithms · Computer Science 2010-03-10 Vamsi Kundeti , Sanguthevar Rajasekaran , Hieu Dinh

SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly

The formal version of our work has been published in BMC Bioinformatics and can be found here: http://www.biomedcentral.com/1471-2105/13/S6/S1 Motivation: To tackle the problem of huge memory usage associated with de Bruijn graph-based…

Data Structures and Algorithms · Computer Science 2013-01-10 Chengxi Ye , Charles H. Cannon , Zhanshan Sam Ma , Douglas W. Yu , Mihai Pop

MSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting

A major challenge in next-generation genome sequencing (NGS) is to assemble massive overlapping short reads that are randomly sampled from DNA fragments. To complete assembling, one needs to finish a fundamental task in many leading…

Genomics · Quantitative Biology 2015-05-26 Yang Li , XifengYan

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an…

Genomics · Quantitative Biology 2015-06-03 Jason Pell , Arend Hintze , Rosangela Canino-Koning , Adina Howe , James M. Tiedje , C. Titus Brown

Lock-free de Bruijn graph

De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-08 Daniel Górniak , Robert Nowak

Succinct Data Structures for Assembling Large Genomes

Motivation: Second generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool…

Genomics · Quantitative Biology 2010-08-17 Thomas C Conway , Andrew J Bromage

On the representation of de Bruijn graphs

The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers…

Quantitative Methods · Quantitative Biology 2014-10-07 Rayan Chikhi , Antoine Limasset , Shaun Jackman , Jared Simpson , Paul Medvedev

Simulating the DNA String Graph in Succinct Space

Converting a set of sequencing reads into a lossless compact data structure that encodes all the relevant biological information is a major challenge. The classical approaches are to build the string graph or the de Bruijn graph. Each has…

Data Structures and Algorithms · Computer Science 2019-12-02 Diego Díaz-Domínguez , Travis Gagie , Gonzalo Navarro

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast amount of short sequences in order to reconstruct the original genome. The central problem of genome assembly could be described as finding a Hamiltonian path through a…

Machine Learning · Computer Science 2020-11-11 Lovro Vrček , Petar Veličković , Mile Šikić

Variable-Order de Bruijn Graphs

The de Bruijn graph $G_K$ of a set of strings $S$ is a key data structure in genome assembly that represents overlaps between all the $K$-length substrings of $S$. Construction and navigation of the graph is a space and time bottleneck in…

Data Structures and Algorithms · Computer Science 2014-11-18 Christina Boucher , Alex Bowe , Travis Gagie , Simon J. Puglisi , Kunihiko Sadakane

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem…

Data Structures and Algorithms · Computer Science 2013-05-22 Kamil Salikhov , Gustavo Sacomoto , Gregory Kucherov

TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes

Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many…

Data Structures and Algorithms · Computer Science 2016-02-19 Ilia Minkin , Son Pham , Paul Medvedev

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-21 Giulia Guidi , Oguz Selvitopi , Marquita Ellis , Leonid Oliker , Katherine Yelick , Aydin Buluc

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

De novo genome assembly, i.e., rebuilding the sequence of an unknown genome from redundant and erroneous short sequences, is a key but computationally intensive step in many genomics pipelines. The exponential growth of genomic data is…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-12 Giulia Guidi , Gabriel Raulet , Daniel Rokhsar , Leonid Oliker , Katherine Yelick , Aydin Buluc

Read Mapping on de Bruijn graph

Background Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig…

Data Structures and Algorithms · Computer Science 2018-02-14 Antoine Limasset , Bastien Cazaux , Eric Rivals , Pierre Peterlongo

Assembling Omnitigs using Hidden-Order de Bruijn Graphs

De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well…

Data Structures and Algorithms · Computer Science 2018-05-15 Diego Díaz-Domínguez , Djamal Belazzougui , Travis Gagie , Veli Mäkinen , Gonzalo Navarro , Simon J. Puglisi

On Solving the Minimum Common String Partition Problem by Decision Diagrams

In the Minimum Common String Partition Problem (MCSP), we are given two strings on input, and we want to partition both into the same collection of substrings, minimizing the number of the substrings in the partition. This combinatorial…

Data Structures and Algorithms · Computer Science 2021-10-12 Miloš Chromý , Markus Sinnl

PANDA: Processing-in-MRAM Accelerated De Bruijn Graph based DNA Assembly

Spurred by widening gap between data processing speed and data communication speed in Von-Neumann computing architectures, some bioinformatic applications have harnessed the computational power of Processing-in-Memory (PIM) platforms.…

Hardware Architecture · Computer Science 2020-08-17 Shaahin Angizi , Naima Ahmed Fahmi , Wei Zhang , Deliang Fan

Practical algorithms for Hierarchical overlap graphs

Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute…

Data Structures and Algorithms · Computer Science 2024-09-24 Saumya Talera , Parth Bansal , Shabnam Khan , Shahbaz Khan