English
Related papers

Related papers: Repeat-Free Codes

200 papers

We consider the problem of constructing a code capable of correcting a single long tandem duplication error of variable length. As the main contribution of this paper, we present a $q$-ary efficiently encodable code of length $n+1$ and…

Information Theory · Computer Science 2023-04-26 Daniil Goshkoder , Nikita Polyanskii , Ilya Vorobyev

Motivated by the established notion of storage codes, we consider sets of infinite sequences over a finite alphabet such that every $k$-tuple of consecutive entries is uniquely recoverable from its $l$-neighborhood in the sequence. We…

Information Theory · Computer Science 2022-03-08 Ohad Elishco , Alexander Barg

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving…

Machine Learning · Computer Science 2020-10-06 Sean Welleck , Ilia Kulikov , Jaedeok Kim , Richard Yuanzhe Pang , Kyunghyun Cho

The problem of reconstructing a sequence from the set of its length-$k$ substrings has received considerable attention due to its various applications in genomics. We study an uncoded version of this problem where multiple random sources…

Information Theory · Computer Science 2023-05-11 Kel Levick , Ilan Shomorony

We consider the problem of constructing binary codes to recover from $k$-bit deletions with efficient encoding/decoding, for a fixed $k$. The single deletion case is well understood, with the Varshamov-Tenengolts-Levenshtein code from 1965…

Information Theory · Computer Science 2019-05-21 Joshua Brakensiek , Venkatesan Guruswami , Samuel Zbarsky

We study the problem of cutting a length-$n$ string of positive real numbers into $k$ pieces so that every piece has sum at least $b$. The problem can also be phrased as transforming such a string into a new one by merging adjacent numbers.…

Data Structures and Algorithms · Computer Science 2023-09-29 Yinqi Cai

The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the…

Information Theory · Computer Science 2023-07-20 Nir Weinberger , Ilan Shomorony

A variable-length code is a fix-free code if no codeword is a prefix or a suffix of any other codeword. In a fix-free code any finite sequence of codewords can be decoded in both directions, which can improve the robustness to channel noise…

Information Theory · Computer Science 2007-07-13 Sergey Yekhanin

Compression is beneficial because it helps detract resource usage. It reduces data storage space as well as transmission traffic and improves web pages loading. Run-length coding (RLC) is a lossless data compression algorithm. Data are…

Data Structures and Algorithms · Computer Science 2016-11-30 Kaveh Geyratmand Haghighi , Mirkamal Mirnia , Ahmad Habibizad Navin

This article describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset's unordered structure. Multisets are a generalisation of sets where members are allowed to occur multiple times. A multiset…

Information Theory · Computer Science 2014-01-27 Christian Steinruecken

Recursive decoding techniques are considered for Reed-Muller (RM) codes of growing length $n$ and fixed order $r.$ An algorithm is designed that has complexity of order $n\log n$ and corrects most error patterns of weight up to…

Information Theory · Computer Science 2017-03-17 Ilya Dumer

The coded trace reconstruction problem asks to construct a code $C\subset \{0,1\}^n$ such that any $x\in C$ is recoverable from independent outputs ("traces") of $x$ from a binary deletion channel (BDC). We present binary codes of rate…

Information Theory · Computer Science 2020-09-15 Joshua Brakensiek , Ray Li , Bruce Spang

We address the non-redundant random generation of $k$ words of length $n$ in a context-free language. Additionally, we want to avoid a predefined set of words. We study a rejection-based approach, whose worst-case time complexity is shown…

Formal Languages and Automata Theory · Computer Science 2012-11-05 Andy Lorenz , Yann Ponty

We consider the problem of constructing codes that can correct deletions that are localized within a certain part of the codeword that is unknown a priori. Namely, the model that we study is when at most $k$ deletions occur in a window of…

Information Theory · Computer Science 2021-05-07 Rawad Bitar , Serge Kas Hanna , Nikita Polyanskii , Ilya Vorobyev

Large-scale distributed storage systems typically use erasure codes to provide durability of data in the face of failures. A set of $k$ blocks to be stored is encoded using an $[n, k]$ code to generate $n$ blocks that are then stored on…

Information Theory · Computer Science 2019-07-31 Francisco Maturana , K. V. Rashmi

The secrecy capacity of a network, for a given collection of permissible wiretap sets, is the maximum rate of communication such that observing links in any permissible wiretap set reveals no information about the message. This paper…

Information Theory · Computer Science 2016-11-17 Tao Cui , Tracey Ho , Joerg Kliewer

We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size $k$, each probability parameter costs essentially $0.5 \log (n/k^3)$ bits, where $n$ is the…

Information Theory · Computer Science 2007-07-13 Gil I. Shamir

We derive the coding capacity for duplication-correcting codes capable of correcting any number of duplications. We do so both for reverse-complement duplications, as well as palindromic (reverse) duplications. We show that except for…

Information Theory · Computer Science 2024-02-21 Lev Yohananov , Moshe Schwartz

Tandem duplication in DNA is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain {\em et al.} (2016) proposed the study of codes that…

Combinatorics · Mathematics 2017-11-20 Yeow Meng Chee , Johan Chrisnata , Han Mao Kiah , Tuan Thanh Nguyen

To render a sequence testable, namely capable of identifying and detecting errors, it is necessary to apply a transformation that increases its length by introducing statistical dependence among symbols, as commonly exemplified by the…

Information Theory · Computer Science 2025-07-08 Aida Koch , Alix Petit
‹ Prev 1 2 3 10 Next ›