Related papers: Coded Shotgun Sequencing
DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are…
The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to…
Genome sequencing is the basis for many modern biological and medicinal studies. With recent technological advances, metagenomics has become a problem of interest. This problem entails the analysis and reconstruction of multiple DNA…
The shotgun sequencing process involves fragmenting a long DNA sequence (input string) into numerous shorter, unordered, and overlapping segments (referred to as \emph{reads}). The reads are sequenced, and later aligned to reconstruct the…
In shotgun sequencing, the input string (typically, a long DNA sequence composed of nucleotide bases) is sequenced as multiple overlapping fragments of much shorter lengths (called \textit{reads}). Modelling the shotgun sequencing pipeline…
We study permutations over the set of $\ell$-grams, that are feasible in the sense that there is a sequence whose $\ell$-gram frequency has the same ranking as the permutation. Codes, which are sets of feasible permutations, protect…
Current techniques in sequencing a genome allow a service provider (e.g. a sequencing company) to have full access to the genome information, and thus the privacy of individuals regarding their lifetime secret is violated. In this paper, we…
The DNA storage channel is considered, in which a codeword is comprised of $M$ unordered DNA molecules. At reading time, $N$ molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding…
The DNA storage channel is considered, in which the $M$ Deoxyribonucleic acid (DNA) molecules comprising each codeword are stored without order, sampled $N$ times with replacement, and then sequenced over a discrete memoryless channel. For…
DNA sequencing has faced a huge demand since it was first introduced as a service to the public. This service is often offloaded to the sequencing companies who will have access to full knowledge of individuals' sequences, a major violation…
Synthesis of DNA molecules offers unprecedented advances in storage technology. Yet, the microscopic world in which these molecules reside induces error patterns that are fundamentally different from their digital counterparts. Hence, to…
We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in…
Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly $120/GB, which is dramatically more expensive than reading from existing archival storage…
As DNA data storage moves closer to practical deployment, minimizing sequencing coverage depth is essential to reduce both operational costs and retrieval latency. This paper addresses the recently studied Random Access Problem, which…
DNA is a leading candidate as the next archival storage media due to its density, durability and sustainability. To read (and write) data DNA storage exploits technology that has been developed over decades to sequence naturally occurring…
DNA data storage systems encode digital data into DNA strands, enabling dense and durable storage. Efficient data retrieval depends on coverage depth, a key performance metric. We study the random access coverage depth problem and focus on…
We study the amount of reliable information that can be stored in a DNA-based storage system with noisy sequencing, where each codeword is composed of short DNA molecules. We analyze a concatenated coding scheme, where the outer code is…
DNA storage is now being considered as a new archival storage method for its durability and high information density, but still facing some challenges like high costs and low throughput. By reducing sequencing sample size for decoding…
DNA has immense potential as an emerging data storage medium. The principle of DNA storage is the conversion and flow of digital information between binary code stream, quaternary base, and actual DNA fragments. This process will inevitably…
The coverage depth problem in DNA data storage is about minimizing the expected number of reads until all data is recovered. When they exist, MDS codes offer the best performance in this context. This paper focuses on the scenario where the…