Related papers: Succinct Data Structures for Assembling Large Geno…

Memory Efficient De Bruijn Graph Construction

Massively parallel DNA sequencing technologies are revolutionizing genomics research. Billions of short reads generated at low costs can be assembled for reconstructing the whole genomes. Unfortunately, the large memory footprint of the…

Data Structures and Algorithms · Computer Science 2012-07-17 Yang Li , Pegah Kamousi , Fangqiu Han , Shengqi Yang , Xifeng Yan , Subhash Suri

SparseAssembler: de novo Assembly with the Sparse de Bruijn Graph

de Bruijn graph-based algorithms are one of the two most widely used approaches for de novo genome assembly. A major limitation of this approach is the large computational memory space requirement to construct the de Bruijn graph, which…

Data Structures and Algorithms · Computer Science 2011-07-11 Chengxi Ye , Zhanshan Sam Ma , Charles H. Cannon , Mihai Pop , Douglas W. Yu

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for {\em de novo} assembly of short-read shotgun sequencing data from these complex populations are an…

Genomics · Quantitative Biology 2015-06-03 Jason Pell , Arend Hintze , Rosangela Canino-Koning , Adina Howe , James M. Tiedje , C. Titus Brown

On the representation of de Bruijn graphs

The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers…

Quantitative Methods · Quantitative Biology 2014-10-07 Rayan Chikhi , Antoine Limasset , Shaun Jackman , Jared Simpson , Paul Medvedev

SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly

The formal version of our work has been published in BMC Bioinformatics and can be found here: http://www.biomedcentral.com/1471-2105/13/S6/S1 Motivation: To tackle the problem of huge memory usage associated with de Bruijn graph-based…

Data Structures and Algorithms · Computer Science 2013-01-10 Chengxi Ye , Charles H. Cannon , Zhanshan Sam Ma , Douglas W. Yu , Mihai Pop

A step towards neural genome assembly

De novo genome assembly focuses on finding connections between a vast amount of short sequences in order to reconstruct the original genome. The central problem of genome assembly could be described as finding a Hamiltonian path through a…

Machine Learning · Computer Science 2020-11-11 Lovro Vrček , Petar Veličković , Mile Šikić

Space efficient merging of de Bruijn graphs and Wheeler graphs

The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de…

Data Structures and Algorithms · Computer Science 2021-07-13 Lavinia Egidi , Felipe A. Louza , Giovanni Manzini

Lock-free de Bruijn graph

De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms due to the increasing number of cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-08 Daniel Górniak , Robert Nowak

Simulating the DNA String Graph in Succinct Space

Converting a set of sequencing reads into a lossless compact data structure that encodes all the relevant biological information is a major challenge. The classical approaches are to build the string graph or the de Bruijn graph. Each has…

Data Structures and Algorithms · Computer Science 2019-12-02 Diego Díaz-Domínguez , Travis Gagie , Gonzalo Navarro

Compression of structured high-throughput sequencing data

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to…

Quantitative Methods · Quantitative Biology 2014-03-05 Fabien Campagne , Kevin C. Dorff , Nyasha Chambwe , James T. Robinson , Jill P. Mesirov , Thomas D. Wu

Supregraph: Enabling Information-Optimal Assembly Graph Representation of a Read Set

The first step in any genome assembly algorithm entails the conversion from the domain of strings and overlaps to the language of graphs and paths, typically using one of the two conventional methods: de Bruijn graphs or overlap graphs.…

Genomics · Quantitative Biology 2026-04-27 Anton Bankevich

Compression of next-generation sequencing reads aided by highly efficient de novo assembly

We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based…

Quantitative Methods · Quantitative Biology 2012-07-11 Daniel C. Jones , Walter L. Ruzzo , Xinxia Peng , Michael G. Katze

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

(An updated version of this manuscript has been accepted to Scientific Reports in 2016, please refer to http://www.nature.com/articles/srep31900) The highly anticipated transition from next generation sequencing (NGS) to third generation…

Genomics · Quantitative Biology 2016-09-06 Chengxi Ye , Chris Hill , Shigang Wu , Jue Ruan , Zhanshan , Ma

Compression of high throughput sequencing data with probabilistic de Bruijn graph

Motivation: Data volumes generated by next-generation sequencing technolo- gies is now a major concern, both for storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as…

Data Structures and Algorithms · Computer Science 2014-12-19 Gaëtan Benoit , Claire Lemaitre , Dominique Lavenier , Guillaume Rizk

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-21 Giulia Guidi , Oguz Selvitopi , Marquita Ellis , Leonid Oliker , Katherine Yelick , Aydin Buluc

TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes

Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many…

Data Structures and Algorithms · Computer Science 2016-02-19 Ilia Minkin , Son Pham , Paul Medvedev

Practical algorithms for Hierarchical overlap graphs

Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute…

Data Structures and Algorithms · Computer Science 2024-09-24 Saumya Talera , Parth Bansal , Shabnam Khan , Shahbaz Khan

Assembling Omnitigs using Hidden-Order de Bruijn Graphs

De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well…

Data Structures and Algorithms · Computer Science 2018-05-15 Diego Díaz-Domínguez , Djamal Belazzougui , Travis Gagie , Veli Mäkinen , Gonzalo Navarro , Simon J. Puglisi

Scalable De Novo Genome Assembly Using Pregel

De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Da Yan , Hongzhi Chen , James Cheng , Zhenkun Cai , Bin Shao

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Background - The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly…

Genomics · Quantitative Biology 2015-02-02 Keith R. Bradnam , Joseph N. Fass , Anton Alexandrov , Paul Baranay , Michael Bechner , İnanç Birol , Sébastien Boisvert , Jarrod A. Chapman , Guillaume Chapuis , Rayan Chikhi , Hamidreza Chitsaz , Wen-Chi Chou , Jacques Corbeil , Cristian Del Fabbro , T. Roderick Docking , Richard Durbin , Dent Earl , Scott Emrich , Pavel Fedotov , Nuno A. Fonseca , Ganeshkumar Ganapathy , Richard A. Gibbs , Sante Gnerre , Élénie Godzaridis , Steve Goldstein , Matthias Haimel , Giles Hall , David Haussler , Joseph B. Hiatt , Isaac Y. Ho , Jason Howard , Martin Hunt , Shaun D. Jackman , David B Jaffe , Erich Jarvis , Huaiyang Jiang , Sergey Kazakov , Paul J. Kersey , Jacob O. Kitzman , James R. Knight , Sergey Koren , Tak-Wah Lam , Dominique Lavenier , François Laviolette , Yingrui Li , Zhenyu Li , Binghang Liu , Yue Liu , Ruibang Luo , Iain MacCallum , Matthew D MacManes , Nicolas Maillet , Sergey Melnikov , Bruno Miguel Vieira , Delphine Naquin , Zemin Ning , Thomas D. Otto , Benedict Paten , Octávio S. Paulo , Adam M. Phillippy , Francisco Pina-Martins , Michael Place , Dariusz Przybylski , Xiang Qin , Carson Qu , Filipe J Ribeiro , Stephen Richards , Daniel S. Rokhsar , J. Graham Ruby , Simone Scalabrin , Michael C. Schatz , David C. Schwartz , Alexey Sergushichev , Ted Sharpe , Timothy I. Shaw , Jay Shendure , Yujian Shi , Jared T. Simpson , Henry Song , Fedor Tsarev , Francesco Vezzi , Riccardo Vicedomini , Jun Wang , Kim C. Worley , Shuangye Yin , Siu-Ming Yiu , Jianying Yuan , Guojie Zhang , Hao Zhang , Shiguo Zhou , Ian F. Korf