Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

Tuan Thanh Nguyen; Kui Cai; Kees A. Schouhamer Immink; Han Mao Kiah

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

Information Theory 2020-01-10 v1 math.IT

Authors: Tuan Thanh Nguyen , Kui Cai , Kees A. Schouhamer Immink , Han Mao Kiah

Abstract

We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given $\ell, {\epsilon} > 0$ , we propose simple and efficient encoders/decoders that transform binary sequences into DNA base sequences (codewords), namely sequences of the symbols A, T, C and G, that satisfy the following properties: (i) Runlength constraint: the maximum homopolymer run in each codeword is at most $\ell$ , (ii) GC-content constraint: the GC-content of each codeword is within $[0.5-{\epsilon}, 0.5+{\epsilon}]$ , (iii) Error-correction: each codeword is capable of correcting a single deletion, or single insertion, or single substitution error. For practical values of $\ell$ and ${\epsilon}$ , we show that our encoders achieve much higher rates than existing results in the literature and approach the capacity. Our methods have low encoding/decoding complexity and limited error propagation.

Keywords

error-correcting codes source coding decoding algorithm

Cite

@article{arxiv.2001.02839,
  title  = {Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage},
  author = {Tuan Thanh Nguyen and Kui Cai and Kees A. Schouhamer Immink and Han Mao Kiah},
  journal= {arXiv preprint arXiv:2001.02839},
  year   = {2020}
}

Capacity-Approaching Constrained Codes with Error Correction for DNA-Based Data Storage

Abstract

Keywords

Cite

Related papers