Related papers: Efficient List-Decodable Regression using Batches

Batch List-Decodable Linear Regression via Higher Moments

We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $\alpha \in (0, 1/2)$, an unknown…

Machine Learning · Computer Science 2025-03-14 Ilias Diakonikolas , Daniel M. Kane , Sushrut Karmalkar , Sihan Liu , Thanasis Pittas

List-Decodable Linear Regression

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $\alpha < 1$, our algorithm takes as input a sample…

Data Structures and Algorithms · Computer Science 2019-05-31 Sushrut Karmalkar , Adam R. Klivans , Pravesh K. Kothari

List-Decodable Mean Estimation in Nearly-PCA Time

Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority $\frac 1 k$ fraction of the…

Data Structures and Algorithms · Computer Science 2020-11-20 Ilias Diakonikolas , Daniel M. Kane , Daniel Kongsgaard , Jerry Li , Kevin Tian

List-Decodable Sparse Mean Estimation

Robust mean estimation is one of the most important problems in statistics: given a set of samples in $\mathbb{R}^d$ where an $\alpha$ fraction are drawn from some distribution $D$ and the rest are adversarially corrupted, we aim to…

Machine Learning · Computer Science 2022-12-07 Shiwei Zeng , Jie Shen

List-Decodable Regression via Expander Sketching

We introduce an expander-sketching framework for list-decodable linear regression that achieves sample complexity $\tilde{O}((d+\log(1/\delta))/\alpha)$, list size $O(1/\alpha)$, and near input-sparsity running time…

Machine Learning · Computer Science 2025-12-01 Herbod Pourali , Sajjad Hashemian , Ebrahim Ardeshir-Larijani

Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $\mathbb{R}^d$ and a parameter $0< \alpha <\frac 1 2$ such that an…

Data Structures and Algorithms · Computer Science 2021-11-15 Ilias Diakonikolas , Daniel M. Kane , Daniel Kongsgaard , Jerry Li , Kevin Tian

High-Accuracy List-Decodable Mean Estimation

In list-decodable learning, we are given a set of data points such that an $\alpha$-fraction of these points come from a nice distribution $D$, for some small $\alpha \ll 1$, and the goal is to output a short list of candidate solutions,…

Machine Learning · Computer Science 2025-11-25 Ziyun Chen , Spencer Compton , Daniel Kane , Jerry Li

List Decodable Learning via Sum of Squares

In the list-decodable learning setup, an overwhelming majority (say a $1-\beta$-fraction) of the input data consists of outliers and the goal of an algorithm is to output a small list $\mathcal{L}$ of hypotheses such that one of them agrees…

Data Structures and Algorithms · Computer Science 2019-05-14 Prasad Raghavendra , Morris Yau

List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time

In list-decodable subspace recovery, the input is a collection of $n$ points $\alpha n$ (for some $\alpha \ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $\Pi_*$ (the…

Data Structures and Algorithms · Computer Science 2021-01-08 Ainesh Bakshi , Pravesh K. Kothari

List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering

We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $\alpha \in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor \alpha m \rfloor$ of which are i.i.d. samples from a distribution $D$…

Data Structures and Algorithms · Computer Science 2024-07-08 Ilias Diakonikolas , Daniel M. Kane , Sushrut Karmalkar , Ankit Pensia , Thanasis Pittas

Learning from Untrusted Data

The vast majority of theoretical results in machine learning and statistics assume that the available training data is a reasonably reliable reflection of the phenomena to be learned or estimated. Similarly, the majority of machine learning…

Machine Learning · Computer Science 2017-06-13 Moses Charikar , Jacob Steinhardt , Gregory Valiant

List Decodable Mean Estimation in Nearly Linear Time

Learning from data in the presence of outliers is a fundamental problem in statistics. Until recently, no computationally efficient algorithms were known to compute the mean of a high dimensional distribution under natural assumptions in…

Data Structures and Algorithms · Computer Science 2021-01-22 Yeshwanth Cherapanamjeri , Sidhanth Mohanty , Morris Yau

List-Decodable Covariance Estimation

We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $\alpha > 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/\alpha)}$ obtained by…

Data Structures and Algorithms · Computer Science 2022-06-23 Misha Ivkov , Pravesh K. Kothari

List Decodable Subspace Recovery

Learning from data in the presence of outliers is a fundamental problem in statistics. In this work, we study robust statistics in the presence of overwhelming outliers for the fundamental problem of subspace recovery. Given a dataset where…

Data Structures and Algorithms · Computer Science 2020-02-11 Prasad Raghavendra , Morris Yau

Statistical Query Lower Bounds for List-Decodable Linear Regression

We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0<…

Data Structures and Algorithms · Computer Science 2021-06-18 Ilias Diakonikolas , Daniel M. Kane , Ankit Pensia , Thanasis Pittas , Alistair Stewart

Linear Regression using Heterogeneous Data Batches

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one…

Machine Learning · Computer Science 2023-09-06 Ayush Jain , Rajat Sen , Weihao Kong , Abhimanyu Das , Alon Orlitsky

Optimal Robust Learning of Discrete Distributions from Batches

Many applications, including natural language processing, sensor networks, collaborative filtering, and federated learning, call for estimating discrete distributions from data collected in batches, some of which may be untrustworthy,…

Machine Learning · Computer Science 2020-02-26 Ayush Jain , Alon Orlitsky

List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians

We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved…

Data Structures and Algorithms · Computer Science 2017-11-21 Ilias Diakonikolas , Daniel M. Kane , Alistair Stewart

List-Decodable Mean Estimation via Iterative Multi-Filtering

We study the problem of {\em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $\mathbb{R}^d$ with the promise that an unknown $\alpha$-fraction of points in $T$, where…

Machine Learning · Computer Science 2020-06-23 Ilias Diakonikolas , Daniel M. Kane , Daniel Kongsgaard

Learning Discrete Distributions from Untrusted Batches

We consider the problem of learning a discrete distribution in the presence of an $\epsilon$ fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source…

Machine Learning · Computer Science 2017-11-23 Mingda Qiao , Gregory Valiant