English

String Attractors: Verification and Optimization

Data Structures and Algorithms 2020-12-09 v2

Abstract

String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set Γ[1..n]\Gamma\subseteq [1..n] is a kk-attractor for a string S[1..σ]nS\in[1..\sigma]^n if and only if every distinct substring of SS of length at most kk has an occurrence straddling at least one of the positions in Γ\Gamma. Finding the smallest kk-attractor is NP-hard for k3k\geq3, but polylogarithmic approximations can be found using reductions from dictionary compressors. It is easy to reduce the kk-attractor problem to a set-cover instance where string's positions are interpreted as sets of substrings. The main result of this paper is a much more powerful reduction based on the truncated suffix tree. Our new characterization of the problem leads to more efficient algorithms for string attractors: we show how to check the validity and minimality of a kk-attractor in near-optimal time and how to quickly compute exact and approximate solutions. For example, we prove that a minimum 33-attractor can be found in optimal O(n)O(n) time when σO(logn3+ϵ)\sigma\in O(\sqrt[3+\epsilon]{\log n}) for any constant ϵ>0\epsilon>0, and 2.452.45-approximation can be computed in O(n)O(n) time on general alphabets. To conclude, we introduce and study the complexity of the closely-related sharp-kk-attractor problem: to find the smallest set of positions capturing all distinct substrings of length exactly kk. We show that the problem is in P for k=1,2k=1,2 and is NP-complete for constant k3k\geq 3.

Keywords

Cite

@article{arxiv.1803.01695,
  title  = {String Attractors: Verification and Optimization},
  author = {Dominik Kempa and Alberto Policriti and Nicola Prezza and Eva Rotenberg},
  journal= {arXiv preprint arXiv:1803.01695},
  year   = {2020}
}
R2 v1 2026-06-23T00:42:27.733Z