Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

Kazuo Iwama; Junichi Teruyama; Shuntaro Tsuyama

Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

Data Structures and Algorithms 2018-08-03 v1

Authors: Kazuo Iwama , Junichi Teruyama , Shuntaro Tsuyama

Abstract

The problem called "String reconstruction from substrings" is a mathematical model of sequencing by hybridization that plays an important role in DNA sequencing. In this problem, we are given a blackbox oracle holding an unknown string ${\mathcal X}$ and are required to obtain (reconstruct) ${\mathcal X}$ through "substring queries" $Q(S)$ . $Q(S)$ is given to the oracle with a string $S$ and the answer of the oracle is Yes if ${\mathcal X}$ includes $S$ as a substring and No otherwise. Our goal is to minimize the number of queries for the reconstruction. In this paper, we deal with only binary strings for ${\mathcal X}$ whose length $n$ is given in advance by using a sequence of good $S$ 's. In 1995, Skiena and Sundaram first studied this problem and obtained an algorithm whose query complexity is $n+O(\log n)$ . Its information theoretic lower bound is $n$ , and they posed an obvious open question; if we can remove the $O(\log n)$ additive term. No progress has been made until now. This paper gives two partially positive answers to this open question. One is a randomized algorithm whose query complexity is $n+O(1)$ with high probability and the other is an average-case algorithm also having a query complexity of $n+O(1)$ on average. The $n$ lower bound is still true for both cases, and hence they are optimal up to an additive constant.

Keywords

string algorithms succinct data structure error-correcting codes

Cite

@article{arxiv.1808.00674,
  title  = {Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms},
  author = {Kazuo Iwama and Junichi Teruyama and Shuntaro Tsuyama},
  journal= {arXiv preprint arXiv:1808.00674},
  year   = {2018}
}

Comments

17 pages, 4 figures

Reconstructing Strings from Substrings: Optimal Randomized and Average-Case Algorithms

Abstract

Keywords

Cite

Comments

Related papers