Optimal Dynamic Parameterized Subset Sampling
Abstract
In this paper, we study the Dynamic Parameterized Subset Sampling (DPSS) problem in the Word RAM model. In DPSS, the input is a set,~, of~ items, where each item,~, has a non-negative integer weight,~. Given a pair of query parameters, , each of which is a non-negative rational number, a parameterized subset sampling query on~ seeks to return a subset such that each item is selected in~, independently, with probability . More specifically, the DPSS problem is defined in a dynamic setting, where the item set,~, can be updated with insertions of new items or deletions of existing items. Our first main result is an optimal algorithm for solving the DPSS problem, which achieves~ pre-processing time, expected time for each query parameterized by , given on-the-fly, and time for each update; here, is the expected size of the query result. At all times, the worst-case space consumption of our algorithm is linear in the current number of items in~. Our second main contribution is a hardness result for the DPSS problem when the item weights are~-word float numbers, rather than integers. Specifically, we reduce Integer Sorting to the deletion-only DPSS problem with float item weights. Our reduction implies that an optimal algorithm for deletion-only DPSS with float item weights (achieving all the same bounds as aforementioned) implies an optimal algorithm for Integer Sorting. The latter remains an important open problem. Last but not least, a key technical ingredient for our first main result is an efficient algorithm for generating Truncated Geometric random variates in expected time in the Word RAM model.
Cite
@article{arxiv.2409.18036,
title = {Optimal Dynamic Parameterized Subset Sampling},
author = {Junhao Gan and Seeun William Umboh and Hanzhi Wang and Anthony Wirth and Zhuo Zhang},
journal= {arXiv preprint arXiv:2409.18036},
year = {2024}
}
Comments
29 pages, 10 figures, to be published in PODS25