English

Distribution Testing Meets Sum Estimation

Data Structures and Algorithms 2025-04-22 v1

Abstract

We study the problem of estimating the sum of nn elements, each with weight w(i)w(i), in a structured universe. Our goal is to estimate W=i=1nw(i)W = \sum_{i=1}^n w(i) within a (1±ϵ)(1 \pm \epsilon) factor using a sublinear number of samples, assuming weights are non-increasing, i.e., w(1)w(2)w(n)w(1) \geq w(2) \geq \dots \geq w(n). The sum estimation problem is well-studied under different access models to the universe UU. However, to the best of our knowledge, nothing is known about the sum estimation problem using non-adaptive conditional sampling. In this work, we explore the sum estimation problem using non-adaptive conditional weighted and non-adaptive conditional uniform samples, assuming that the underlying distribution (D(i)=w(i)/WD(i)=w(i)/W) is monotone. We also extend our approach to to the case where the underlying distribution of UU is unimodal. Additionally, we consider support size estimation when w(i)=0w(i) = 0 or w(i)W/nw(i) \geq W/n, using hybrid sampling (both weighted and uniform) to access UU. We propose an algorithm to estimate WW under the non-increasing weight assumption, using O(1ϵ3logn+1ϵ6)O(\frac{1}{\epsilon^3} \log{n} + \frac{1}{\epsilon^6}) non-adaptive weighted conditional samples and O(1ϵ3logn)O(\frac{1}{\epsilon^3} \log{n}) uniform conditional samples. Our algorithm matches the Ω(logn)\Omega(\log{n}) lower bound by \cite{ACK15}. For unimodal distributions, the sample complexity remains similar, with an additional O(logn)O(\log{n}) evaluation queries to locate the minimum weighted point in the domain. For estimating the support size kk of UU, where weights are either 00 or at least W/nW/n, our algorithm uses O(log3(n/ϵ)ϵ8log4log(n/ϵ)ϵ)O\big( \frac{\log^3(n/\epsilon)}{\epsilon^8} \cdot \log^4 \frac{\log(n/\epsilon)}{\epsilon} \big) uniform samples and O(log(n/ϵ)ϵ2loglog(n/ϵ)ϵ)O\big( \frac{\log(n/\epsilon)}{\epsilon^2} \cdot \log \frac{\log(n/\epsilon)}{\epsilon} \big) weighted samples to output k^\hat{k} satisfying k2ϵnk^k+ϵnk - 2\epsilon n \leq \hat{k} \leq k + \epsilon n.

Keywords

Cite

@article{arxiv.2504.15153,
  title  = {Distribution Testing Meets Sum Estimation},
  author = {Pinki Pradhan and Sampriti Roy},
  journal= {arXiv preprint arXiv:2504.15153},
  year   = {2025}
}