Related papers: Support Estimation with Sampling Artifacts and Err…
The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in…
This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to…
We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many…
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in…
We introduce a new method for estimating the support size of an unknown distribution which provably matches the performance bounds of the state-of-the-art techniques in the area and outperforms them in practice. In particular, we present…
We propose a coupled bootstrap (CB) method for the test error of an arbitrary algorithm that estimates the mean in a Poisson sequence, often called the Poisson means problem. The idea behind our method is to generate two carefully-designed…
Bayesian hierarchical Poisson models are an essential tool for analyzing count data. However, designing efficient algorithms to sample from the posterior distribution of the target parameters remains a challenging task for this class of…
Compressed sensing deals with the reconstruction of sparse signals using a small number of linear measurements. One of the main challenges in compressed sensing is to find the support of a sparse signal. In the literature, several bounds on…
Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. Practical observations, however, often contain compositional noise, i.e. a nonlinear shift along the time axis,…
In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to…
Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer…
Bayesian analysis is increasingly popular for use in social science and other application areas where the data are observations from an informative sample. An informative sampling design leads to inclusion probabilities that are correlated…
Given a statistical model, we propose a novel estimation method that yields randomised estimators for the unknown distribution of an observed random variable. We establish non-asymptotic bounds for the performance of these estimators and…
Estimation of a deterministic quantity observed in non-Gaussian additive noise is explored via order statistics approach. More specifically, we study the estimation problem when measurement noises either have positive supports or follow a…
Discrete biomarkers derived as cell densities or counts from tissue microarrays and immunostaining are widely used to study immune signatures in relation to survival outcomes in cancer. Although routinely collected, these signatures are not…
Implicit sampling is a weighted sampling method that is used in data assimilation, where one sequentially updates estimates of the state of a stochastic model based on a stream of noisy or incomplete data. Here we describe how to use…
Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational…
We consider a problem of statistical mean estimation in which the samples are not observed directly, but are instead observed by a relay (``teacher'') that transmits information through a memoryless channel to the decoder (``student''), who…
We consider the problem of exact support recovery of sparse signals via noisy measurements. The main focus is the sufficient and necessary conditions on the number of measurements for support recovery to be reliable. By drawing an analogy…
Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for…