English

Very Fast Streaming Submodular Function Maximization

Machine Learning 2021-05-11 v5 Discrete Mathematics Machine Learning

Abstract

Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been in the focus of summarization algorithms. These algorithms offer worst-case approximations guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this worst-case, but are usually much more well-behaved. In this paper, we propose a new submodular function maximization algorithm called ThreeSieves, which ignores the worst-case, but delivers a good solution in high probability. It selects the most informative items from a data-stream on the fly and maintains a provable performance on a fixed memory budget. In an extensive evaluation, we compare our method against 66 other methods on 88 different datasets with and without concept drift. We show that our algorithm outperforms current state-of-the-art algorithms and, at the same time, uses fewer resources. Last, we highlight a real-world use-case of our algorithm for data summarization in gamma-ray astronomy. We make our code publicly available at https://github.com/sbuschjaeger/SubmodularStreamingMaximization.

Keywords

Cite

@article{arxiv.2010.10059,
  title  = {Very Fast Streaming Submodular Function Maximization},
  author = {Sebastian Buschjäger and Philipp-Jan Honysz and Lukas Pfahler and Katharina Morik},
  journal= {arXiv preprint arXiv:2010.10059},
  year   = {2021}
}

Comments

9 pages, 14 pages appendix, 5 figures, 2 tables, 10 algorithms

R2 v1 2026-06-23T19:28:41.062Z