Related papers: A Sampling Algebra for Aggregate Estimation

Optimal Sensing and Data Estimation in a Large Sensor Network

An energy efficient use of large scale sensor networks necessitates activating a subset of possible sensors for estimation at a fusion center. The problem is inherently combinatorial; to this end, a set of iterative, randomized algorithms…

Information Theory · Computer Science 2017-09-13 Arpan Chattopadhyay , Urbashi Mitra

Gradient and Uncertainty Enhanced Sequential Sampling for Global Fit

Surrogate models based on machine learning methods have become an important part of modern engineering to replace costly computer simulations. The data used for creating a surrogate model are essential for the model accuracy and often…

Machine Learning · Statistics 2023-10-03 Sven Lämmle , Can Bogoclu , Kevin Cremanns , Dirk Roos

A Survey of Distributed Data Aggregation Algorithms

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-10-05 Paulo Jesus , Carlos Baquero , Paulo Sérgio Almeida

On the variance of subset sum estimation

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup

Generalized entropy calibration for analyzing voluntary survey data

Statistical analysis of voluntary survey data is an important area of research in survey sampling. We consider a unified approach to voluntary survey data analysis under the assumption that the sampling mechanism is ignorable. Generalized…

Methodology · Statistics 2025-06-03 Yonghyun Kwon , Jae Kwang Kim , Yumou Qiu

PAC-Bayes Analysis for Recalibration in Classification

Nonparametric estimation using uniform-width binning is a standard approach for evaluating the calibration performance of machine learning models. However, existing theoretical analyses of the bias induced by binning are limited to binary…

Machine Learning · Computer Science 2025-07-14 Masahiro Fujisawa , Futoshi Futami

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts. The state-of-the-art aggregation models,…

Machine Learning · Statistics 2018-06-05 Haitao Liu , Jianfei Cai , Yi Wang , Yew-Soon Ong

On sampling from data with duplicate records

Data deduplication is the task of detecting records in a database that correspond to the same real-world entity. Our goal is to develop a procedure that samples uniformly from the set of entities present in the database in the presence of…

Machine Learning · Computer Science 2020-08-25 Alireza Heidari , Shrinu Kushagra , Ihab F. Ilyas

Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

For many applications, an ensemble of base classifiers is an effective solution. The tuning of its parameters(number of classes, amount of data on which each classifier is to be trained on, etc.) requires G, the generalization error of a…

Machine Learning · Computer Science 2017-11-16 Dhruv Mahajan , Vivek Gupta , S Sathiya Keerthi , Sellamanickam Sundararajan , Shravan Narayanamurthy , Rahul Kidambi

GROS: A General Robust Aggregation Strategy

A new, very general, robust procedure for combining estimators in metric spaces is introduced GROS. The method is reminiscent of the well-known median of means, as described in \cite{devroye2016sub}. Initially, the sample is divided into…

Statistics Theory · Mathematics 2024-02-26 Alejandro Cholaquidis , Emilien Joly , Leonardo Moreno

A Statistical Approach Towards Robust Progress Estimation

The need for accurate SQL progress estimation in the context of decision support administration has led to a number of techniques proposed for this task. Unfortunately, no single one of these progress estimators behaves robustly across the…

Databases · Computer Science 2012-01-04 Arnd Christian König , Bolin Ding , Surajit Chaudhuri , Vivek Narasayya

A Simple and Efficient Sampling-based Algorithm for General Reachability Analysis

In this work, we analyze an efficient sampling-based algorithm for general-purpose reachability analysis, which remains a notoriously challenging problem with applications ranging from neural network verification to safety analysis of…

Systems and Control · Electrical Eng. & Systems 2022-04-15 Thomas Lew , Lucas Janson , Riccardo Bonalli , Marco Pavone

Effective Sampling: Fast Segmentation Using Robust Geometric Model Fitting

Identifying the underlying models in a set of data points contaminated by noise and outliers, leads to a highly complex multi-model fitting problem. This problem can be posed as a clustering problem by the projection of higher order…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Ruwan Tennakoon , Alireza Sadri , Reza Hoseinnezhad , Alireza Bab-Hadiashar

A Distributed Block-Split Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems

Sampling-based algorithms are classical approaches to perform Bayesian inference in inverse problems. They provide estimators with the associated credibility intervals to quantify the uncertainty on the estimators. Although these methods…

Methodology · Statistics 2023-11-28 Pierre-Antoine Thouvenin , Audrey Repetti , Pierre Chainais

Ensemble Sampling

Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for…

Machine Learning · Statistics 2023-04-26 Xiuyuan Lu , Benjamin Van Roy

Data Distribution Valuation Using Generalized Bayesian Inference

We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can…

Machine Learning · Computer Science 2026-04-08 Cuong N. Nguyen , Cuong V. Nguyen

Learning from Summarized Data: Gaussian Process Regression with Sample Quasi-Likelihood

Gaussian process regression is a powerful Bayesian nonlinear regression method. Recent research has enabled the capture of many types of observations using non-Gaussian likelihoods. To deal with various tasks in spatial modeling, we benefit…

Machine Learning · Statistics 2025-08-26 Yuta Shikuri

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size $N$, $m_N$ subsamples are randomly drawn, and each…

Methodology · Statistics 2021-03-05 Tao Zou , Xian Li , Xuan Liang , Hansheng Wang

Uniform Sampling for Matrix Approximation

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…

Data Structures and Algorithms · Computer Science 2014-08-22 Michael B. Cohen , Yin Tat Lee , Cameron Musco , Christopher Musco , Richard Peng , Aaron Sidford

Understanding Uncertainty Sampling via Equivalent Loss

Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely…

Machine Learning · Computer Science 2026-04-08 Shang Liu , Xiaocheng Li