Related papers: Computing High-dimensional Confidence Sets for Arb…

Learning Confidence Ellipsoids and Applications to Robust Subspace Recovery

We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $\alpha$, the goal is to find the smallest volume ellipsoid $E$ which…

Data Structures and Algorithms · Computer Science 2026-05-12 Chao Gao , Liren Shan , Vaidehi Srinivas , Aravindan Vijayaraghavan

Robust Estimators in High Dimensions without the Computational Intractability

We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning…

Data Structures and Algorithms · Computer Science 2019-03-18 Ilias Diakonikolas , Gautam Kamath , Daniel Kane , Jerry Li , Ankur Moitra , Alistair Stewart

A Note on High-Dimensional Confidence Regions

Recent advances in statistics introduced versions of the central limit theorem for high-dimensional vectors, allowing for the construction of confidence regions for high-dimensional parameters. In this note, $s$-sparsely convex…

Statistics Theory · Mathematics 2021-05-20 Sven Klaassen

Testing the Manifold Hypothesis

The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for fitting a…

Statistics Theory · Mathematics 2013-12-23 Charles Fefferman , Sanjoy Mitter , Hariharan Narayanan

Optimal Confidence Regions for the Multinomial Parameter

Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider…

Machine Learning · Statistics 2021-02-01 Matthew L. Malloy , Ardhendu Tripathy , Robert D. Nowak

High-Dimensional Robust Mean Estimation in Nearly-Linear Time

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with…

Machine Learning · Computer Science 2018-11-26 Yu Cheng , Ilias Diakonikolas , Rong Ge

Covering of high-dimensional sets

Let $(\mathcal{X},\rho)$ be a metric space and $\lambda$ be a Borel measure on this space defined on the $\sigma$-algebra generated by open subsets of $\mathcal{X}$; this measure $\lambda$ defines volumes of Borel subsets of $\mathcal{X}$.…

Optimization and Control · Mathematics 2022-11-07 Anatoly Zhigljavsky , Jack Noonan

Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility

We study the Convex Set Disjointness (CSD) problem, where two players have input sets taken from an arbitrary fixed domain~$U\subseteq \mathbb{R}^d$ of size $\lvert U\rvert = n$. Their mutual goal is to decide using minimum communication…

Data Structures and Algorithms · Computer Science 2019-09-10 Mark Braverman , Gillat Kol , Shay Moran , Raghuvansh R. Saxena

Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each…

Data Structures and Algorithms · Computer Science 2021-05-04 Lunjia Hu , Omer Reingold

Geometry of a Set and its Random covers

Let $E$ be a bounded open subset of $\mathbb{R}^n$. We study the following questions: For i.i.d. samples $X_1, \dots, X_N$ drawn uniformly from $E$, what is the probability that $\cup_i \mathbf{B}(X_i, \delta)$, the union of $\delta$-balls…

Probability · Mathematics 2023-07-06 Enrique Alvarado , Bala Krishnamoorthy , Kevin R. Vixie

High-Dimensional Robust Mean Estimation with Untrusted Batches

We study high-dimensional mean estimation in a collaborative setting where data is contributed by $N$ users in batches of size $n$. In this environment, a learner seeks to recover the mean $\mu$ of a true distribution $P$ from a collection…

Machine Learning · Computer Science 2026-02-25 Maryam Aliakbarpour , Vladimir Braverman , Yuhan Liu , Junze Yin

Robust Conformal Volume Estimation in 3D Medical Images

Volumetry is one of the principal downstream applications of 3D medical image segmentation, for example, to detect abnormal tissue growth or for surgery planning. Conformal Prediction is a promising framework for uncertainty quantification,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Benjamin Lambert , Florence Forbes , Senan Doyle , Michel Dojat

Exact Minimum-Volume Confidence Set Intersection for Multinomial Outcomes

Computation of confidence sets is central to data science and machine learning, serving as the workhorse of A/B testing and underpinning the operation and analysis of reinforcement learning algorithms. Among all valid confidence sets for…

Machine Learning · Statistics 2026-01-27 Heguang Lin , Binhao Chen , Mengze Li , Daniel Pimentel-Alarcón , Matthew L. Malloy

Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension

We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the…

Statistics Theory · Mathematics 2020-01-01 Jisu Kim , Jaehyeok Shin , Alessandro Rinaldo , Larry Wasserman

Geometry of the Minimum Volume Confidence Sets

Computation of confidence sets is central to data science and machine learning, serving as the workhorse of A/B testing and underpinning the operation and analysis of reinforcement learning algorithms. This paper studies the geometry of the…

Machine Learning · Statistics 2022-02-17 Heguang Lin , Mengze Li , Daniel Pimentel-Alarcón , Matthew Malloy

Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design

We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to…

Data Structures and Algorithms · Computer Science 2018-07-18 Aleksandar Nikolov , Mohit Singh , Uthaipon Tao Tantipongpipat

Statistical Learning of Arbitrary Computable Classifiers

Statistical learning theory chiefly studies restricted hypothesis classes, particularly those with finite Vapnik-Chervonenkis (VC) dimension. The fundamental quantity of interest is the sample complexity: the number of samples required to…

Machine Learning · Computer Science 2008-07-10 David Soloveichik

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Let $p$ be an unknown and arbitrary probability distribution over $[0,1)$. We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from $p$ and must (with high probability) output a…

Machine Learning · Computer Science 2014-11-04 Siu-On Chan , Ilias Diakonikolas , Rocco A. Servedio , Xiaorui Sun

Faster Algorithms for High-Dimensional Robust Covariance Estimation

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem…

Machine Learning · Computer Science 2019-06-12 Yu Cheng , Ilias Diakonikolas , Rong Ge , David Woodruff

High-Dimensional Probability Estimation with Deep Density Models

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations…

Machine Learning · Statistics 2013-02-22 Oren Rippel , Ryan Prescott Adams