Related papers: Mean Estimation from Coarse Data: Characterization…
For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and…
Estimation of the covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as Principal Component Analysis. However, frequently used empirical…
We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i.i.d. via the…
We study mean estimation for a Gaussian distribution with identity covariance in $\mathbb{R}^d$ under a missing data scheme termed realizable $\epsilon$-contamination model. In this model an adversary can choose a function $r(x)$ between 0…
We consider the problem of estimating the mean and covariance of a distribution from iid samples in $\mathbb{R}^n$, in the presence of an $\eta$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is…
Multivariate Gaussian is often used as a first approximation to the distribution of high-dimensional data. Determining the parameters of this distribution under various constraints is a widely studied problem in statistics, and is often…
Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not…
We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing…
We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $\epsilon$…
We consider the problem of mean estimation assuming only finite variance. We study a new class of mean estimators constructed by integrating over random noise applied to a soft-truncated empirical mean estimator. For appropriate choices of…
This paper focuses on the estimation of the sample covariance matrix from low-dimensional random projections of data known as compressive measurements. In particular, we present an unbiased estimator to extract the covariance structure from…
The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time…
Consider the problem of estimating the mean of a Gaussian random vector when the mean vector is assumed to be in a given convex set. The most natural solution is to take the Euclidean projection of the data vector on to this convex set; in…
Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\epsilon$-fraction of which have been arbitrarily corrupted by a malicious…
We consider the classical problem of estimating the covariance matrix of a subgaussian distribution from i.i.d. samples in the novel context of coarse quantization, i.e., instead of having full knowledge of the samples, they are quantized…
We study polynomial time algorithms for estimating the mean of a heavy-tailed multivariate random vector. We assume only that the random vector $X$ has finite mean and covariance. In this setting, the radius of confidence intervals achieved…
In many areas of science one aims to estimate latent sub-population mean curves based only on observations of aggregated population curves. By aggregated curves we mean linear combination of functional data that cannot be observed…
Gaussian process regression is a powerful Bayesian nonlinear regression method. Recent research has enabled the capture of many types of observations using non-Gaussian likelihoods. To deal with various tasks in spatial modeling, we benefit…
Partition-wise models offer a flexible approach for modeling complex and multidimensional data that are capable of producing interpretable results. They are based on partitioning the observed data into regions, each of which is modeled with…
We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to…