Related papers: Support Estimation with Sampling Artifacts and Err…

Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in…

Information Theory · Computer Science 2016-08-31 Jonathan Scarlett , Volkan Cevher

Sample Selection Bias Correction Theory

This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to…

Machine Learning · Computer Science 2008-12-18 Corinna Cortes , Mehryar Mohri , Michael Riley , Afshin Rostamizadeh

Learning-based Support Estimation in Sublinear Time

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many…

Machine Learning · Computer Science 2021-06-17 Talya Eden , Piotr Indyk , Shyam Narayanan , Ronitt Rubinfeld , Sandeep Silwal , Tal Wagner

Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in…

Machine Learning · Statistics 2026-04-21 Jonas Arruda , Sophie Chervet , Paula Staudt , Andreas Wieser , Michael Hoelscher , Isabelle Sermet-Gaudelus , Nadine Binder , Lulla Opatowski , Jan Hasenauer

Regularized Weighted Chebyshev Approximations for Support Estimation

We introduce a new method for estimating the support size of an unknown distribution which provably matches the performance bounds of the state-of-the-art techniques in the area and outperforms them in practice. In particular, we present…

Machine Learning · Statistics 2019-10-22 I , Chien , Olgica Milenkovic

Unbiased Test Error Estimation in the Poisson Means Problem via Coupled Bootstrap Techniques

We propose a coupled bootstrap (CB) method for the test error of an arbitrary algorithm that estimates the mean in a Poisson sequence, often called the Poisson means problem. The idea behind our method is to generate two carefully-designed…

Methodology · Statistics 2024-08-20 Natalia L. Oliveira , Jing Lei , Ryan J. Tibshirani

A note on auxiliary mixture sampling for Bayesian Poisson models

Bayesian hierarchical Poisson models are an essential tool for analyzing count data. However, designing efficient algorithms to sample from the posterior distribution of the target parameters remains a challenging task for this class of…

Methodology · Statistics 2025-02-10 Aldo Gardini , Fedele Greco , Carlo Trivisano

An Estimation Theoretic Approach for Sparsity Pattern Recovery in the Noisy Setting

Compressed sensing deals with the reconstruction of sparse signals using a small number of linear measurements. One of the main challenges in compressed sensing is to find the support of a sparse signal. In the literature, several bounds on…

Information Theory · Computer Science 2009-11-26 Ali Hormati , Amin Karbasi , Soheil Mohajer , Martin Vetterli

Intensity Estimation for Poisson Process with Compositional Noise

Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. Practical observations, however, often contain compositional noise, i.e. a nonlinear shift along the time axis,…

Methodology · Statistics 2019-09-25 Glenna Schluck , Wei Wu , Anuj Srivastava

Correcting sampling biases via importance reweighting for spatial modeling

In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to…

Machine Learning · Computer Science 2023-09-15 Boris Prokhorov , Diana Koldasbayeva , Alexey Zaytsev

Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling

Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer…

Machine Learning · Statistics 2025-09-01 Peiqi Zhao , Carlos E. Rodríguez , Ramsés H. Mena , Stephen G. Walker

Bayesian Estimation Under Informative Sampling

Bayesian analysis is increasingly popular for use in social science and other application areas where the data are observations from an informative sample. An informative sampling design leads to inclusion probabilities that are correlated…

Statistics Theory · Mathematics 2016-06-07 Terrance D. Savitsky , Daniell Toth

Statistical Inference via T-Posterior Randomised Estimators

Given a statistical model, we propose a novel estimation method that yields randomised estimators for the unknown distribution of an observed random variable. We establish non-asymptotic bounds for the performance of these estimators and…

Statistics Theory · Mathematics 2026-05-06 Yannick Baraud

Exploring Positive Noise in Estimation Theory

Estimation of a deterministic quantity observed in non-Gaussian additive noise is explored via order statistics approach. More specifically, we study the estimation problem when measurement noises either have positive supports or follow a…

Signal Processing · Electrical Eng. & Systems 2020-07-15 Kamiar Radnosrati , Gustaf Hendeby , Fredrik Gustafsson

Survival Analysis with Discrete Biomarkers Under a Semiparametric Bayesian Conditional Poisson Model

Discrete biomarkers derived as cell densities or counts from tissue microarrays and immunostaining are widely used to study immune signatures in relation to survival outcomes in cancer. Although routinely collected, these signatures are not…

Methodology · Statistics 2025-09-24 Aijun Yang , Phineas T. Hamilton , Brad H. Nelson , Julian J. Lum , Mary Lesperance , Farouk S. Nathoo

Parameter estimation by implicit sampling

Implicit sampling is a weighted sampling method that is used in data assimilation, where one sequentially updates estimates of the state of a stochastic model based on a stream of noisy or incomplete data. Here we describe how to use…

Numerical Analysis · Mathematics 2016-01-20 Matthias Morzfeld , Xuemin Tu , Jon Wilkening , Alexandre J. Chorin

Confirmation Bias in Gaussian Mixture Models

Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational…

Machine Learning · Statistics 2025-09-09 Amnon Balanov , Tamir Bendory , Wasim Huleihel

Statistical Mean Estimation with Coded Relayed Observations

We consider a problem of statistical mean estimation in which the samples are not observed directly, but are instead observed by a relay (``teacher'') that transmits information through a memoryless channel to the decoder (``student''), who…

Information Theory · Computer Science 2025-05-15 Yan Hao Ling , Zhouhao Yang , Jonathan Scarlett

Support Recovery of Sparse Signals

We consider the problem of exact support recovery of sparse signals via noisy measurements. The main focus is the sufficient and necessary conditions on the number of measurements for support recovery to be reliable. By drawing an analogy…

Information Theory · Computer Science 2010-03-04 Yuzhe Jin , Young-Han Kim , Bhaskar D. Rao

Quantifying Uncertainty in the Presence of Distribution Shifts

Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for…

Machine Learning · Statistics 2025-12-22 Yuli Slavutsky , David M. Blei