Related papers: Computing High-dimensional Confidence Sets for Arb…
We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $\alpha$, the goal is to find the smallest volume ellipsoid $E$ which…
We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning…
Recent advances in statistics introduced versions of the central limit theorem for high-dimensional vectors, allowing for the construction of confidence regions for high-dimensional parameters. In this note, $s$-sparsely convex…
The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for fitting a…
Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider…
We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with…
Let $(\mathcal{X},\rho)$ be a metric space and $\lambda$ be a Borel measure on this space defined on the $\sigma$-algebra generated by open subsets of $\mathcal{X}$; this measure $\lambda$ defines volumes of Borel subsets of $\mathcal{X}$.…
We study the Convex Set Disjointness (CSD) problem, where two players have input sets taken from an arbitrary fixed domain~$U\subseteq \mathbb{R}^d$ of size $\lvert U\rvert = n$. Their mutual goal is to decide using minimum communication…
We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each…
Let $E$ be a bounded open subset of $\mathbb{R}^n$. We study the following questions: For i.i.d. samples $X_1, \dots, X_N$ drawn uniformly from $E$, what is the probability that $\cup_i \mathbf{B}(X_i, \delta)$, the union of $\delta$-balls…
We study high-dimensional mean estimation in a collaborative setting where data is contributed by $N$ users in batches of size $n$. In this environment, a learner seeks to recover the mean $\mu$ of a true distribution $P$ from a collection…
Volumetry is one of the principal downstream applications of 3D medical image segmentation, for example, to detect abnormal tissue growth or for surgery planning. Conformal Prediction is a promising framework for uncertainty quantification,…
Computation of confidence sets is central to data science and machine learning, serving as the workhorse of A/B testing and underpinning the operation and analysis of reinforcement learning algorithms. Among all valid confidence sets for…
We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the…
Computation of confidence sets is central to data science and machine learning, serving as the workhorse of A/B testing and underpinning the operation and analysis of reinforcement learning algorithms. This paper studies the geometry of the…
We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to…
Statistical learning theory chiefly studies restricted hypothesis classes, particularly those with finite Vapnik-Chervonenkis (VC) dimension. The fundamental quantity of interest is the sample complexity: the number of samples required to…
Let $p$ be an unknown and arbitrary probability distribution over $[0,1)$. We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from $p$ and must (with high probability) output a…
We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem…
One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations…