Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage
Abstract
We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given , we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most , (ii) GC-content constraint: the GC-content of each codeword is within , (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of and , we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.
Cite
@article{arxiv.2001.02839,
title = {Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage},
author = {Tuan Thanh Nguyen and Kui Cai and Kees A. Schouhamer Immink and Han Mao Kiah},
journal= {arXiv preprint arXiv:2001.02839},
year = {2020}
}