Related papers: Do Read Errors Matter for Genome Assembly?
High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads…
The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to…
The study of functional genomics--particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally…
DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are…
The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to…
Recent emergence of next-generation DNA sequencing technology has enabled acquisition of genetic information at unprecedented scales. In order to determine the genetic blueprint of an organism, sequencing platforms typically employ…
Recent work identified the fundamental limits on the information requirements in terms of read length and coverage depth required for successful de novo genome reconstruction from shotgun sequencing data, based on the idealistic assumption…
In this paper, fundamental limits in sequencing of a set of closely related DNA molecules are addressed. This problem is called pooled-DNA sequencing which encompasses many interesting problems such as haplotype phasing, metageomics, and…
Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking…
Genome sequencing is the basis for many modern biological and medicinal studies. With recent technological advances, metagenomics has become a problem of interest. This problem entails the analysis and reconstruction of multiple DNA…
(An updated version of this manuscript has been accepted to Scientific Reports in 2016, please refer to http://www.nature.com/articles/srep31900) The highly anticipated transition from next generation sequencing (NGS) to third generation…
De novo assembly is the process of reconstructing the genome sequence of an organism from sequencing reads. Genome sequences are essential to biology, and assembly has been a central problem in bioinformatics for four decades. Until…
Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating…
Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to…
DNA is a leading candidate as the next archival storage media due to its density, durability and sustainability. To read (and write) data DNA storage exploits technology that has been developed over decades to sequence naturally occurring…
The shotgun sequencing process involves fragmenting a long DNA sequence (input string) into numerous shorter, unordered, and overlapping segments (referred to as \emph{reads}). The reads are sequenced, and later aligned to reconstruct the…
Over the past two decades, a series of works have aimed at studying the problem of genome assembly: the process of reconstructing a genome from sequence reads. An early formulation of the genome assembly problem showed that genome…
Earlier formulations of the DNA assembly problem were all in the context of perfect assembly; i.e., given a set of reads from a long genome sequence, is it possible to perfectly reconstruct the original sequence? In practice, however, it is…
Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in arXiv:1203.6233, is what read length and coverage depth…
Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using…