Related papers: On the Palindromic/Reverse-Complement Duplication …
We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for…
In this work, we propose constructions that correct duplications of multiple consecutive symbols. These errors are known as tandem duplications, where a sequence of symbols is repeated; respectively as palindromic duplications, where a…
A (tandem) duplication of length $ k $ is an insertion of an exact copy of a substring of length $ k $ next to its original position. This and related types of impairments are of relevance in modeling communication in the presence of…
We consider the problem of constructing a code capable of correcting a single long tandem duplication error of variable length. As the main contribution of this paper, we present a $q$-ary efficiently encodable code of length $n+1$ and…
Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions,…
In this work, we derive upper bounds on the cardinality of tandem duplication and palindromic deletion correcting codes by deriving the generalized sphere packing bound for these error types. We first prove that an upper bound for tandem…
Motivated by DNA storage in living organisms, and by known biological mutation processes, we study the reverse-complement string-duplication system. We fully classify the conditions under which the system has full expressiveness, for all…
We consider the problem of constructing binary codes to recover from $k$-bit deletions with efficient encoding/decoding, for a fixed $k$. The single deletion case is well understood, with the Varshamov-Tenengolts-Levenshtein code from 1965…
We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage.…
In this work, we investigate the problem of constructing codes capable of correcting two deletions. In particular, we construct a code that requires redundancy approximately 8 log n + O(log log n) bits of redundancy, where n is the length…
Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient…
Correcting insertions/deletions as well as substitution errors simultaneously plays an important role in DNA-based storage systems as well as in classical communications. This paper deals with the fundamental task of constructing codes that…
We study codes that can correct backtracking errors during nanopore sequencing. In this channel, a sequence of length $n$ over an alphabet of size $q$ is being read by a sliding window of length $\ell$, where from each window we obtain only…
A method to construct and count all the linear codes (of arbitrary length) in $\mathbb{F}_{4}$ that are invariant under reverse permutation and that contain the repetition code is presented. These codes are suitable for constructing DNA…
Levenshtein introduced the problem of constructing $k$-deletion correcting codes in 1966, proved that the optimal redundancy of those codes is $O(k\log N)$, and proposed an optimal redundancy single-deletion correcting code (using the…
Tandem duplication in DNA is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain {\em et al.} (2016) proposed the study of codes that…
Recent work by Smagloy et al. (ISIT 2020) shows that the redundancy of a single-deletion $s$-substitution correcting code is asymptotically at least $(s+1)\log n+o(\log n)$, where $n$ is the length of the codes. They also provide a…
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors…
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors…
Codes in the Damerau--Levenshtein metric have been extensively studied recently owing to their applications in DNA-based data storage. In particular, Gabrys, Yaakobi, and Milenkovic (2017) designed a length-$n$ code correcting a single…