String Attractors: Verification and Optimization
Abstract
String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set is a -attractor for a string if and only if every distinct substring of of length at most has an occurrence straddling at least one of the positions in . Finding the smallest -attractor is NP-hard for , but polylogarithmic approximations can be found using reductions from dictionary compressors. It is easy to reduce the -attractor problem to a set-cover instance where string's positions are interpreted as sets of substrings. The main result of this paper is a much more powerful reduction based on the truncated suffix tree. Our new characterization of the problem leads to more efficient algorithms for string attractors: we show how to check the validity and minimality of a -attractor in near-optimal time and how to quickly compute exact and approximate solutions. For example, we prove that a minimum -attractor can be found in optimal time when for any constant , and -approximation can be computed in time on general alphabets. To conclude, we introduce and study the complexity of the closely-related sharp--attractor problem: to find the smallest set of positions capturing all distinct substrings of length exactly . We show that the problem is in P for and is NP-complete for constant .
Cite
@article{arxiv.1803.01695,
title = {String Attractors: Verification and Optimization},
author = {Dominik Kempa and Alberto Policriti and Nicola Prezza and Eva Rotenberg},
journal= {arXiv preprint arXiv:1803.01695},
year = {2020}
}