Related papers: Repeat-Free Codes
We consider the problem of constructing a code capable of correcting a single long tandem duplication error of variable length. As the main contribution of this paper, we present a $q$-ary efficiently encodable code of length $n+1$ and…
Motivated by the established notion of storage codes, we consider sets of infinite sequences over a finite alphabet such that every $k$-tuple of consecutive entries is uniquely recoverable from its $l$-neighborhood in the sequence. We…
Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving…
The problem of reconstructing a sequence from the set of its length-$k$ substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources…
We consider the problem of constructing binary codes to recover from $k$-bit deletions with efficient encoding/decoding, for a fixed $k$. The single deletion case is well understood, with the Varshamov-Tenengolts-Levenshtein code from 1965…
We study the problem of cutting a length-$n$ string of positive real numbers into $k$ pieces so that every piece has sum at least $b$. The problem can also be phrased as transforming such a string into a new one by merging adjacent numbers.…
The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the…
A variable-length code is a fix-free code if no codeword is a prefix or a suffix of any other codeword. In a fix-free code any finite sequence of codewords can be decoded in both directions, which can improve the robustness to channel noise…
Compression is beneficial because it helps detract resource usage. It reduces data storage space as well as transmission traffic and improves web pages loading. Run-length coding (RLC) is a lossless data compression algorithm. Data are…
This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…
Recursive decoding techniques are considered for Reed-Muller (RM) codes of growing length $n$ and fixed order $r.$ An algorithm is designed that has complexity of order $n\log n$ and corrects most error patterns of weight up to…
The coded trace reconstruction problem asks to construct a code $C\subset \{0,1\}^n$ such that any $x\in C$ is recoverable from independent outputs ("traces") of $x$ from a binary deletion channel (BDC). We present binary codes of rate…
We address the non-redundant random generation of $k$ words of length $n$ in a context-free language. Additionally, we want to avoid a predefined set of words. We study a rejection-based approach, whose worst-case time complexity is shown…
We consider the problem of constructing codes that can correct deletions that are localized within a certain part of the codeword that is unknown a priori. Namely, the model that we study is when at most $k$ deletions occur in a window of…
Large-scale distributed storage systems typically use erasure codes to provide durability of data in the face of failures. A set of $k$ blocks to be stored is encoded using an $[n, k]$ code to generate $n$ blocks that are then stored on…
The secrecy capacity of a network, for a given collection of permissible wiretap sets, is the maximum rate of communication such that observing links in any permissible wiretap set reveals no information about the message. This paper…
We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size $k$, each probability parameter costs essentially $0.5 \log (n/k^3)$ bits, where $n$ is the…
We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for…
Tandem duplication in DNA is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain {\em et al.} (2016) proposed the study of codes that…
To render a sequence testable, namely capable of identifying and detecting errors, it is necessary to apply a transformation that increases its length by introducing statistical dependence among symbols, as commonly exemplified by the…