Related papers: Repeat-Free Codes

Codes Correcting a Single Long Duplication Error

We consider the problem of constructing a code capable of correcting a single long tandem duplication error of variable length. As the main contribution of this paper, we present a $q$-ary efficiently encodable code of length $n+1$ and…

Information Theory · Computer Science 2023-04-26 Daniil Goshkoder , Nikita Polyanskii , Ilya Vorobyev

Recoverable Systems

Motivated by the established notion of storage codes, we consider sets of infinite sequences over a finite alphabet such that every $k$-tuple of consecutive entries is uniquely recoverable from its $l$-neighborhood in the sequence. We…

Information Theory · Computer Science 2022-03-08 Ohad Elishco , Alexander Barg

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving…

Machine Learning · Computer Science 2020-10-06 Sean Welleck , Ilia Kulikov , Jaedeok Kim , Richard Yuanzhe Pang , Kyunghyun Cho

Fundamental Limits of Multiple Sequence Reconstruction from Substrings

The problem of reconstructing a sequence from the set of its length-$k$ substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources…

Information Theory · Computer Science 2023-05-11 Kel Levick , Ilan Shomorony

Efficient Low-Redundancy Codes for Correcting Multiple Deletions

We consider the problem of constructing binary codes to recover from $k$-bit deletions with efficient encoding/decoding, for a fixed $k$. The single deletion case is well understood, with the Varshamov-Tenengolts-Levenshtein code from 1965…

Information Theory · Computer Science 2019-05-21 Joshua Brakensiek , Venkatesan Guruswami , Samuel Zbarsky

Cut a Numeric String into Required Pieces

We study the problem of cutting a length-$n$ string of positive real numbers into $k$ pieces so that every piece has sum at least $b$. The problem can also be phrased as transforming such a string into a new one by merging adjacent numbers.…

Data Structures and Algorithms · Computer Science 2023-09-29 Yinqi Cai

Fundamental Limits of Reference-Based Sequence Reordering

The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the…

Information Theory · Computer Science 2023-07-20 Nir Weinberger , Ilan Shomorony

Improved Upper Bound for the Redundancy of Fix-Free Codes

A variable-length code is a fix-free code if no codeword is a prefix or a suffix of any other codeword. In a fix-free code any finite sequence of codewords can be decoded in both directions, which can improve the robustness to channel noise…

Information Theory · Computer Science 2007-07-13 Sergey Yekhanin

Optimizing run-length algorithm using octonary repetition tree

Compression is beneficial because it helps detract resource usage. It reduces data storage space as well as transmission traffic and improves web pages loading. Run-length coding (RLC) is a lossless data compression algorithm. Data are…

Data Structures and Algorithms · Computer Science 2016-11-30 Kaveh Geyratmand Haghighi , Mirkamal Mirnia , Ahmad Habibizad Navin

Compressing Sets and Multisets of Sequences

This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…

Information Theory · Computer Science 2014-01-27 Christian Steinruecken

Recursive Decoding and Its Performance for Low-Rate Reed-Muller Codes

Recursive decoding techniques are considered for Reed-Muller (RM) codes of growing length $n$ and fixed order $r.$ An algorithm is designed that has complexity of order $n\log n$ and corrects most error patterns of weight up to…

Information Theory · Computer Science 2017-03-17 Ilya Dumer

Coded trace reconstruction in a constant number of traces

The coded trace reconstruction problem asks to construct a code $C\subset \{0,1\}^n$ such that any $x\in C$ is recoverable from independent outputs ("traces") of $x$ from a binary deletion channel (BDC). We present binary codes of rate…

Information Theory · Computer Science 2020-09-15 Joshua Brakensiek , Ray Li , Bruce Spang

Non-redundant random generation algorithms for weighted context-free languages

We address the non-redundant random generation of $k$ words of length $n$ in a context-free language. Additionally, we want to avoid a predefined set of words. We study a rejection-based approach, whose worst-case time complexity is shown…

Formal Languages and Automata Theory · Computer Science 2012-11-05 Andy Lorenz , Yann Ponty

Optimal Codes Correcting Localized Deletions

We consider the problem of constructing codes that can correct deletions that are localized within a certain part of the codeword that is unknown a priori. Namely, the model that we study is when at most $k$ deletions occur in a window of…

Information Theory · Computer Science 2021-05-07 Rawad Bitar , Serge Kas Hanna , Nikita Polyanskii , Ilya Vorobyev

Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage

Large-scale distributed storage systems typically use erasure codes to provide durability of data in the face of failures. A set of $k$ blocks to be stored is encoded using an $[n, k]$ code to generate $n$ blocks that are then stored on…

Information Theory · Computer Science 2019-07-31 Francisco Maturana , K. V. Rashmi

On Secure Network Coding with Nonuniform or Restricted Wiretap Sets

The secrecy capacity of a network, for a given collection of permissible wiretap sets, is the maximum rate of communication such that observing links in any permissible wiretap set reveals no information about the message. This paper…

Information Theory · Computer Science 2016-11-17 Tao Cui , Tracey Ho , Joerg Kliewer

Universal Source Coding for Monotonic and Fast Decaying Monotonic Distributions

We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size $k$, each probability parameter costs essentially $0.5 \log (n/k^3)$ bits, where $n$ is the…

Information Theory · Computer Science 2007-07-13 Gil I. Shamir

On the Coding Capacity of Reverse-Complement and Palindromic Duplication-Correcting Codes

We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for…

Information Theory · Computer Science 2024-02-21 Lev Yohananov , Moshe Schwartz

Deciding the Confusability of Words under Tandem Repeats

Tandem duplication in DNA is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain {\em et al.} (2016) proposed the study of codes that…

Combinatorics · Mathematics 2017-11-20 Yeow Meng Chee , Johan Chrisnata , Han Mao Kiah , Tuan Thanh Nguyen

Set Shaping Theory and the Foundations of Redundancy-Free Testable Codes

To render a sequence testable, namely capable of identifying and detecting errors, it is necessary to apply a transformation that increases its length by introducing statistical dependence among symbols, as commonly exemplified by the…

Information Theory · Computer Science 2025-07-08 Aida Koch , Alix Petit