Related papers: SparseAssembler2: Sparse k-mer Graph for Memory Ef…

SparseAssembler: de novo Assembly with the Sparse de Bruijn Graph

de Bruijn graph-based algorithms are one of the two most widely used approaches for de novo genome assembly. A major limitation of this approach is the large computational memory space requirement to construct the de Bruijn graph, which…

Data Structures and Algorithms · Computer Science 2011-07-11 Chengxi Ye , Zhanshan Sam Ma , Charles H. Cannon , Mihai Pop , Douglas W. Yu

Memory Efficient De Bruijn Graph Construction

Massively parallel DNA sequencing technologies are revolutionizing genomics research. Billions of short reads generated at low costs can be assembled for reconstructing the whole genomes. Unfortunately, the large memory footprint of the…

Data Structures and Algorithms · Computer Science 2012-07-17 Yang Li , Pegah Kamousi , Fangqiu Han , Shengqi Yang , Xifeng Yan , Subhash Suri

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an…

Genomics · Quantitative Biology 2015-06-03 Jason Pell , Arend Hintze , Rosangela Canino-Koning , Adina Howe , James M. Tiedje , C. Titus Brown

Informed and Automated k-Mer Size Selection for Genome Assembly

Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically…

Genomics · Quantitative Biology 2013-04-23 Rayan Chikhi , Paul Medvedev

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-21 Giulia Guidi , Oguz Selvitopi , Marquita Ellis , Leonid Oliker , Katherine Yelick , Aydin Buluc

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

De novo genome assembly, i.e., rebuilding the sequence of an unknown genome from redundant and erroneous short sequences, is a key but computationally intensive step in many genomics pipelines. The exponential growth of genomic data is…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-12 Giulia Guidi , Gabriel Raulet , Daniel Rokhsar , Leonid Oliker , Katherine Yelick , Aydin Buluc

Succinct Data Structures for Assembling Large Genomes

Motivation: Second generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool…

Genomics · Quantitative Biology 2010-08-17 Thomas C Conway , Andrew J Bromage

On the representation of de Bruijn graphs

The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers…

Quantitative Methods · Quantitative Biology 2014-10-07 Rayan Chikhi , Antoine Limasset , Shaun Jackman , Jared Simpson , Paul Medvedev

Supregraph: Enabling Information-Optimal Assembly Graph Representation of a Read Set

The first step in any genome assembly algorithm entails the conversion from the domain of strings and overlaps to the language of graphs and paths, typically using one of the two conventional methods: de Bruijn graphs or overlap graphs.…

Genomics · Quantitative Biology 2026-04-27 Anton Bankevich

MSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting

A major challenge in next-generation genome sequencing (NGS) is to assemble massive overlapping short reads that are randomly sampled from DNA fragments. To complete assembling, one needs to finish a fundamental task in many leading…

Genomics · Quantitative Biology 2015-05-26 Yang Li , XifengYan

Pangenome-guided sequence assembly via binary optimisation

De novo genome assembly is challenging in highly repetitive regions; however, reference-guided assemblers often suffer from bias. We propose a framework for pangenome-guided sequence assembly, which can resolve short-read data in complex…

Quantum Physics · Physics 2026-02-11 Josh Cudby , James Bonfield , Chenxi Zhou , Richard Durbin , Sergii Strelchuk

Lock-free de Bruijn graph

De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-08 Daniel Górniak , Robert Nowak

Practical algorithms for Hierarchical overlap graphs

Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute…

Data Structures and Algorithms · Computer Science 2024-09-24 Saumya Talera , Parth Bansal , Shabnam Khan , Shahbaz Khan

Bermuda: Bidirectional de novo assembly of transcripts with new insights for handling uneven coverage

Motivation: RNA-seq has made feasible the analysis of a whole set of expressed mRNAs. Mapping-based assembly of RNA-seq reads sometimes is infeasible due to lack of high-quality references. However, de novo assembly is very challenging due…

Genomics · Quantitative Biology 2015-06-19 Qingming Tang , Sheng Wang , Jian Peng , Jianzhu Ma , Jinbo Xu

Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

The rapid expansion of genomic sequence data calls for new methods to achieve robust sequence representations. Existing techniques often neglect intricate structural details, emphasizing mainly contextual information. To address this, we…

Machine Learning · Computer Science 2023-12-08 Kacper Kapuśniak , Manuel Burger , Gunnar Rätsch , Amir Joudaki

Scalable De Novo Genome Assembly Using Pregel

De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Da Yan , Hongzhi Chen , James Cheng , Zhenkun Cai , Bin Shao

KMC 2: Fast and resource-frugal $k$-mer counting

Motivation: Building the histogram of occurrences of every $k$-symbol long substring of nucleotide data is a standard step in many bioinformatics applications, known under the name of $k$-mer counting. Its applications include developing de…

Data Structures and Algorithms · Computer Science 2017-03-03 Sebastian Deorowicz , Marek Kokot , Szymon Grabowski , Agnieszka Debudaj-Grabysz

Simulating the DNA String Graph in Succinct Space

Converting a set of sequencing reads into a lossless compact data structure that encodes all the relevant biological information is a major challenge. The classical approaches are to build the string graph or the de Bruijn graph. Each has…

Data Structures and Algorithms · Computer Science 2019-12-02 Diego Díaz-Domínguez , Travis Gagie , Gonzalo Navarro

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast amount of short sequences in order to reconstruct the original genome. The central problem of genome assembly could be described as finding a Hamiltonian path through a…

Machine Learning · Computer Science 2020-11-11 Lovro Vrček , Petar Veličković , Mile Šikić

A Novel Compiler Transformation for Fast Sparse Matrix Multiplication in GPUs

Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…

Programming Languages · Computer Science 2025-06-19 Hossein Albakri , Kazem Cheshmi