Related papers: On the Reverse-Complement String-Duplication Syste…

On the Palindromic/Reverse-Complement Duplication Correcting Codes

Motivated by applications in in-vivo DNA storage, we study codes for correcting duplications. A reverse-complement duplication of length $k$ is the insertion of the reversed and complemented copy of a substring of length $k$ adjacent to its…

Information Theory · Computer Science 2026-02-03 Yubo Sun , Gennian Ge

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for…

Information Theory · Computer Science 2024-02-21 Lev Yohananov , Moshe Schwartz

On Duplication-Free Codes for Disjoint or Equal-Length Errors

Motivated by applications in DNA storage, we study a setting in which strings are affected by tandem-duplication errors. In particular, we look at two settings: disjoint tandem-duplication errors, and equal-length tandem-duplication errors.…

Information Theory · Computer Science 2024-01-10 Wenjun Yu , Moshe Schwartz

Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems

A (tandem) duplication of length $ k $ is an insertion of an exact copy of a substring of length $ k $ next to its original position. This and related types of impairments are of relevance in modeling communication in the presence of…

Information Theory · Computer Science 2020-08-13 Mladen Kovačević , Vincent Y. F. Tan

Reconstruction from Substrings with Partial Overlap

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous…

Information Theory · Computer Science 2022-05-10 Yonatan Yehezkeally , Daniella Bar-Lev , Sagi Marcovich , Eitan Yaakobi

The Capacity of Some P\'olya String Models

We study random string-duplication systems, which we call P\'olya string models. These are motivated by DNA storage in living organisms, and certain random mutation processes that affect their genome. Unlike previous works that study the…

Information Theory · Computer Science 2018-08-21 Ohad Elishco , Farzad Farnoud , Moshe Schwartz , Jehoshua Bruck

Unique Reconstruction of Coded Strings from Multiset Substring Spectra

The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded…

Information Theory · Computer Science 2019-04-24 Ryan Gabrys , Olgica Milenkovic

Generalized Unique Reconstruction from Substrings

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous…

Information Theory · Computer Science 2023-04-21 Yonatan Yehezkeally , Daniella Bar-Lev , Sagi Marcovich , Eitan Yaakobi

The Capacity of String-Replication Systems

It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In…

Information Theory · Computer Science 2014-01-21 Farzad Farnoud , Moshe Schwartz , Jehoshua Bruck

Reconstruction of a Single String from a Part of its Composition Multiset

Motivated by applications in polymer-based data storage, we study the problem of reconstructing a string from part of its composition multiset. We give a full description of the structure of the strings that cannot be uniquely reconstructed…

Information Theory · Computer Science 2022-10-18 Zuo Ye , Ohad Elishco

Efficient Pattern Matching on Binary Strings

The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in…

Data Structures and Algorithms · Computer Science 2008-10-15 Simone Faro , Thierry Lecroq

Exhaustive Exact String Matching: The Analysis of the Full Human Genome

Exact string matching has been a fundamental problem in computer science for decades because of many practical applications. Some are related to common procedures, such as searching in files and text editors, or, more recently, to more…

Data Structures and Algorithms · Computer Science 2019-07-29 Konstantinos F. Xylogiannopoulos

Efficient Encoding/Decoding of Irreducible Words for Codes Correcting Tandem Duplications

Tandem duplication is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2017) proposed the study of codes that correct tandem…

Information Theory · Computer Science 2018-01-09 Yeow Meng Chee , Johan Chrisnata , Han Mao Kiah , Tuan Thanh Nguyen

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging…

Information Theory · Computer Science 2020-11-12 Yuanyuan Tang , Farzad Farnoud

Non-binary Codes for Correcting a Burst of at Most t Deletions

The problem of correcting deletions has received significant attention, partly because of the prevalence of these errors in DNA data storage. In this paper, we study the problem of correcting a consecutive burst of at most $t$ deletions in…

Information Theory · Computer Science 2022-10-24 Shuche Wang , Yuanyuan Tang , Jin Sima , Ryan Gabrys , Farzad Farnoud

Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors…

Information Theory · Computer Science 2016-11-17 Siddharth Jain , Farzad Farnoud , Moshe Schwartz , Jehoshua Bruck

Coding for Ordered Composite DNA Sequences

To increase the information capacity of DNA storage, composite DNA letters were introduced. We propose a novel channel model for composite DNA in which composite sequences are decomposed into ordered standard non-composite sequences. The…

Information Theory · Computer Science 2025-10-31 Besart Dollma , Ohad Elishco , Eitan Yaakobi

On the Maximum Number of Non-Confusable Strings Evolving Under Short Tandem Duplications

The set of all $ q $-ary strings that do not contain repeated substrings of length $ \leqslant\! 3 $ (i.e., that do not contain substrings of the form $ a a $, $ a b a b $, and $ a b c a b c $) constitutes a code correcting an arbitrary…

Information Theory · Computer Science 2022-07-01 Mladen Kovačević

Flexible RNA design under structure and sequence constraints using formal languages

The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical…

Quantitative Methods · Quantitative Biology 2013-08-02 Yu Zhou , Yann Ponty , Stéphane Vialette , Jérôme Waldispühl , Yi Zhang , Alain Denise

Data Deduplication with Random Substitutions

Data deduplication saves storage space by identifying and removing repeats in the data stream. Compared with traditional compression methods, data deduplication schemes are more time efficient and are thus widely used in large scale storage…

Information Theory · Computer Science 2022-05-30 Hao Lou , Farzad Farnoud