English

Maximum Coverage in Sublinear Space, Faster

Data Structures and Algorithms 2023-12-13 v2

Abstract

Given a collection of mm sets from a universe U\mathcal{U}, the Maximum Set Coverage problem consists of finding kk sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 11/e1-1/e. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe U|\mathcal{U}|. However, one randomized streaming algorithm has been shown to produce a 11/eε1-1/e-\varepsilon approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to mm and U|\mathcal{U}|. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions, and F0F_0-sketching. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of Ω(ε2klogm)\Omega(\varepsilon^{-2} k \log m) can be fine-tuned to Ω(klogm)\Omega(k \log m). Secondly we show that F0F_0-sketching can be replaced by a much more simple mechanism. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude.

Keywords

Cite

@article{arxiv.2302.06137,
  title  = {Maximum Coverage in Sublinear Space, Faster},
  author = {Stephen Jaud and Anthony Wirth and Farhana Choudhury},
  journal= {arXiv preprint arXiv:2302.06137},
  year   = {2023}
}

Comments

12 pages, 7 figures