English

Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings

Data Structures and Algorithms 2020-09-15 v4

Abstract

Given a string TT of length nn, a substring u=T[i..j]u = T[i..j] of TT is called a shortest unique substring (SUS) for an interval [s,t][s,t] if (a) uu occurs exactly once in TT, (b) uu contains the interval [s,t][s,t] (i.e. istji \leq s \leq t \leq j), and (c) every substring vv of TT with v<u|v| < |u| containing [s,t][s,t] occurs at least twice in TT. Given a query interval [s,t][1,n][s, t] \subset [1, n], the interval SUS problem is to output all the SUSs for the interval [s,t][s,t]. In this article, we propose a 4n+o(n)4n + o(n) bits data structure answering an interval SUS query in output-sensitive O(occ)O(\mathit{occ}) time, where occ\mathit{occ} is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for s=ts = t. Here, we propose a (log23+1)n+o(n)\lceil (\log_2{3} + 1)n \rceil + o(n) bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of TT.

Keywords

Cite

@article{arxiv.1905.12854,
  title  = {Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings},
  author = {Takuya Mieno and Dominik Köppl and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Masayuki Takeda},
  journal= {arXiv preprint arXiv:1905.12854},
  year   = {2020}
}