Related papers: Computing Data Distribution from Query Selectiviti…

The Most Dispersed Subset of Random Points in $\mathbb{R}^d$

Consider a population of $N$ individuals, each having $d\geq 1$ different traits, and an additive measure, called dispersion, which rewards large pairwise separations between traits. The goal is to select $M\leq N$ individuals such that…

Statistical Mechanics · Physics 2026-05-01 Fabio Deelan Cunden , Noemi Cuppone , Giovanni Gramegna , Pierpaolo Vivo

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Efficient and Private Approximations of Distributed Databases Calculations

In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data…

Databases · Computer Science 2016-05-23 Philip Derbeko , Shlomi Dolev , Ehud Gudes , Jeffrey D. Ullman

Parametric Scenario Optimization under Limited Data: A Distributionally Robust Optimization View

We consider optimization problems with uncertain constraints that need to be satisfied probabilistically. When data are available, a common method to obtain feasible solutions for such problems is to impose sampled constraints, following…

Optimization and Control · Mathematics 2020-07-09 Henry Lam , Fengpei Li

Monte Carlo convergence of rival samplers

It is often necessary to make sampling-based statistical inference about many probability distributions in parallel. Given a finite computational resource, this article addresses how to optimally divide sampling effort between the samplers…

Methodology · Statistics 2015-02-18 Nicholas Heard , Melissa Turcotte

Debiasing Guidance for Discrete Diffusion with Sequential Monte Carlo

Discrete diffusion models are a class of generative models that produce samples from an approximated data distribution within a discrete state space. Often, there is a need to target specific regions of the data distribution. Current…

Machine Learning · Computer Science 2025-09-03 Cheuk Kit Lee , Paul Jeha , Jes Frellsen , Pietro Lio , Michael Samuel Albergo , Francisco Vargas

A General Characterization of the Statistical Query Complexity

Statistical query (SQ) algorithms are algorithms that have access to an {\em SQ oracle} for the input distribution $D$ instead of i.i.d.~ samples from $D$. Given a query function $\phi:X \rightarrow [-1,1]$, the oracle returns an estimate…

Machine Learning · Computer Science 2017-04-18 Vitaly Feldman

On Adaptive Distance Estimation

We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data…

Data Structures and Algorithms · Computer Science 2020-12-17 Yeshwanth Cherapanamjeri , Jelani Nelson

Differentially Private Sampling from Distributions

We initiate an investigation of private sampling from distributions. Given a dataset with $n$ independent observations from an unknown distribution $P$, a sampling algorithm must output a single observation from a distribution that is close…

Machine Learning · Computer Science 2022-11-16 Sofya Raskhodnikova , Satchit Sivakumar , Adam Smith , Marika Swanberg

Analytical Quantile Solution for the S-distribution, Random Number Generation and Statistical Data Modeling

The selection of a specific statistical distribution is seldom a simple problem. One strategy consists in testing different distributions (normal, lognormal, Weibull, etc.), and selecting the one providing the best fit to the observed data…

Statistics Theory · Mathematics 2019-10-14 Benito Hernández-Bermejo , Albert Sorribas

Statistical-Computational Trade-offs for Density Estimation

We study the density estimation problem defined as follows: given $k$ distributions $p_1, \ldots, p_k$ over a discrete domain $[n]$, as well as a collection of samples chosen from a ``query'' distribution $q$ over $[n]$, output $p_i$ that…

Data Structures and Algorithms · Computer Science 2024-10-31 Anders Aamand , Alexandr Andoni , Justin Y. Chen , Piotr Indyk , Shyam Narayanan , Sandeep Silwal , Haike Xu

Optimal Robust Learning of Discrete Distributions from Batches

Many applications, including natural language processing, sensor networks, collaborative filtering, and federated learning, call for estimating discrete distributions from data collected in batches, some of which may be untrustworthy,…

Machine Learning · Computer Science 2020-02-26 Ayush Jain , Alon Orlitsky

Range (R\'enyi) Entropy Queries and Partitioning

Data partitioning that maximizes/minimizes the Shannon entropy, or more generally the R\'enyi entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be…

Data Structures and Algorithms · Computer Science 2025-11-05 Aryan Esmailpour , Sanjay Krishnan , Stavros Sintos

Optimal Algorithms for Augmented Testing of Discrete Distributions

We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution $p$, extensive research has established optimal bounds for uniformity testing,…

Machine Learning · Computer Science 2024-12-03 Maryam Aliakbarpour , Piotr Indyk , Ronitt Rubinfeld , Sandeep Silwal

Optimal Database Allocation in Finite Time with Efficient Communication and Transmission Stopping over Dynamic Networks

In this paper, we focus on the problem of data sharing over a wireless computer network (i.e., a wireless grid). Given a set of available data, we present a distributed algorithm which operates over a dynamically changing network, and…

Systems and Control · Electrical Eng. & Systems 2022-07-19 Apostolos I. Rikos , Christoforos N. Hadjicostis , Karl H. Johansson

Consistent and Flexible Selectivity Estimation for High-Dimensional Data

Selectivity estimation aims at estimating the number of database objects that satisfy a selection criterion. Answering this problem accurately and efficiently is essential to many applications, such as density estimation, outlier detection,…

Databases · Computer Science 2021-05-28 Yaoshu Wang , Chuan Xiao , Jianbin Qin , Rui Mao , Onizuka Makoto , Wei Wang , Rui Zhang , Yoshiharu Ishikawa

Collection and Dissemination of Data on Time-Varying Digraphs

Given a network of fixed size $n$ and an initial distribution of data, we derive sufficient connectivity conditions on a sequence of time-varying digraphs for (a) data collection and (b) data dissemination, within at most $(n-1)$…

Systems and Control · Computer Science 2016-05-03 Kevin Topley

Product Distribution Field Theory

This paper presents a novel way to approximate a distribution governing a system of coupled particles with a product of independent distributions. The approach is an extension of mean field theory that allows the independent distributions…

Statistical Mechanics · Physics 2007-05-23 David H. Wolpert

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor…

Databases · Computer Science 2019-06-19 Shohedul Hasan , Saravanan Thirumuruganathan , Jees Augustine , Nick Koudas , Gautam Das

Optimal Quantization for Distribution Synthesis

Finite precision approximations of discrete probability distributions are considered, applicable for distribution synthesis, e.g., probabilistic shaping. Two algorithms are presented that find the optimal $M$-type approximation $Q$ of a…

Information Theory · Computer Science 2017-05-08 Georg Böcherer , Bernhard C. Geiger