Related papers: Efficient Truncated Statistics with Unknown Trunca…
Accurate assessment of systematic uncertainties is an increasingly vital task in physics studies, where large, high-dimensional datasets, like those collected at the Large Hadron Collider, hold the key to new discoveries. Common approaches…
Consider the problem of estimating the mean of a Gaussian random vector when the mean vector is assumed to be in a given convex set. The most natural solution is to take the Euclidean projection of the data vector on to this convex set; in…
We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics \cite{woodroofe1985estimating, stute1993almost}. The…
We study a distributed estimation problem in which two remotely located parties, Alice and Bob, observe an unlimited number of i.i.d. samples corresponding to two different parts of a random vector. Alice can send $k$ bits on average to…
We consider the estimation of a sparse parameter vector from measurements corrupted by white Gaussian noise. Our focus is on unbiased estimation as a setting under which the difficulty of the problem can be quantified analytically. We show…
The majority of research on efficient and scalable algorithms in computational science and engineering has focused on the forward problem: given parameter inputs, solve the governing equations to determine output quantities of interest. In…
We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. The benchmark information-theoretic results in the case of d-regular graphs require the number of samples to be at least proportional to…
Parametric stochastic simulators are ubiquitous in science, often featuring high-dimensional input parameters and/or an intractable likelihood. Performing Bayesian parameter inference in this context can be challenging. We present a neural…
Robust mean estimation is one of the most important problems in statistics: given a set of samples in $\mathbb{R}^d$ where an $\alpha$ fraction are drawn from some distribution $D$ and the rest are adversarially corrupted, we aim to…
Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs,…
Low-rank tensor models are widely used in statistics. However, most existing methods rely heavily on the assumption that data follows a sub-Gaussian distribution. To address the challenges associated with heavy-tailed distributions…
The article starts with new aliasing-truncation error upper bounds in the sampling theorem for non-bandlimited stochastic signals. Then, it investigates $L_p([0,T])$ approximations of sub-Gaussian random signals. Explicit truncation error…
In high-dimensional Bayesian statistics, various methods have been developed, including prior distributions that induce parameter sparsity to handle many parameters. Yet, these approaches often overlook the rich spectral structure of the…
We consider the problem of identifying the parameters of an unknown mixture of two arbitrary $d$-dimensional gaussians from a sequence of independent random samples. Our main results are upper and lower bounds giving a computationally…
The truncated plurigaussian model is often used to simulate the spatial distribution of random categorical variables such as geological facies. The problems addressed in this paper are the estimation of parameters of the truncation map for…
Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to…
We introduce the truncated Gaussian graphical model (TGGM) as a novel framework for designing statistical models for nonlinear learning. A TGGM is a Gaussian graphical model (GGM) with a subset of variables truncated to be nonnegative. The…
Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic…
Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features…
We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved…