Related papers: Learning from Untrusted Data

A Data Prism: Semi-Verified Learning in the Small-Alpha Regime

We consider a model of unreliable or crowdsourced data where there is an underlying set of $n$ binary variables, each evaluator contributes a (possibly unreliable or adversarial) estimate of the values of some subset of $r$ of the…

Machine Learning · Computer Science 2017-08-10 Michela Meister , Gregory Valiant

LiD-FL: Towards List-Decodable Federated Learning

Federated learning is often used in environments with many unverified participants. Therefore, federated learning under adversarial attacks receives significant attention. This paper proposes an algorithmic framework for list-decodable…

Machine Learning · Computer Science 2025-02-28 Hong Liu , Liren Shan , Han Bao , Ronghui You , Yuhao Yi , Jiancheng Lv

Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation

Many machine learning approaches are characterized by information constraints on how they interact with the training data. These include memory and sequential access constraints (e.g. fast first-order methods to solve stochastic…

Machine Learning · Computer Science 2014-10-29 Ohad Shamir

Robust Learning from Untrusted Sources

Modern machine learning methods often require more data for training than a single expert can provide. Therefore, it has become a standard procedure to collect data from external sources, e.g. via crowdsourcing. Unfortunately, the quality…

Machine Learning · Computer Science 2019-05-20 Nikola Konstantinov , Christoph Lampert

Learning Discrete Distributions from Untrusted Batches

We consider the problem of learning a discrete distribution in the presence of an $\epsilon$ fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source…

Machine Learning · Computer Science 2017-11-23 Mingda Qiao , Gregory Valiant

Probabilistic Inference for Learning from Untrusted Sources

Federated learning brings potential benefits of faster learning, better solutions, and a greater propensity to transfer when heterogeneous data from different parties increases diversity. However, because federated learning tasks tend to be…

Machine Learning · Computer Science 2021-01-18 Duc Thien Nguyen , Shiau Hoong Lim , Laura Wynter , Desmond Cai

Data Selection: A General Principle for Building Small Interpretable Models

We present convincing empirical evidence for an effective and general strategy for building accurate small models. Such models are attractive for interpretability and also find use in resource-constrained environments. The strategy is to…

Machine Learning · Computer Science 2024-04-30 Abhishek Ghose

High-Accuracy List-Decodable Mean Estimation

In list-decodable learning, we are given a set of data points such that an $\alpha$-fraction of these points come from a nice distribution $D$, for some small $\alpha \ll 1$, and the goal is to output a short list of candidate solutions,…

Machine Learning · Computer Science 2025-11-25 Ziyun Chen , Spencer Compton , Daniel Kane , Jerry Li

List-Decodable Linear Regression

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $\alpha < 1$, our algorithm takes as input a sample…

Data Structures and Algorithms · Computer Science 2019-05-31 Sushrut Karmalkar , Adam R. Klivans , Pravesh K. Kothari

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the…

Machine Learning · Computer Science 2021-10-28 Tobias Sutter , Andreas Krause , Daniel Kuhn

Efficiently Learning Structured Distributions from Untrusted Batches

We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume $m$ users, all of whom have samples from some underlying distribution $p$ over $1, \ldots, n$. Each user sends a batch of $k$ i.i.d.…

Data Structures and Algorithms · Computer Science 2019-11-07 Sitan Chen , Jerry Li , Ankur Moitra

Assisted Learning for Organizations with Limited Imbalanced Data

In the era of big data, many big organizations are integrating machine learning into their work pipelines to facilitate data analysis. However, the performance of their trained models is often restricted by limited and imbalanced data…

Machine Learning · Computer Science 2024-03-05 Cheng Chen , Jiaying Zhou , Jie Ding , Yi Zhou

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Efficient List-Decodable Regression using Batches

We begin the study of list-decodable linear regression using batches. In this setting only an $\alpha \in (0,1]$ fraction of the batches are genuine. Each genuine batch contains $\ge n$ i.i.d. samples from a common unknown distribution and…

Machine Learning · Computer Science 2022-11-24 Abhimanyu Das , Ayush Jain , Weihao Kong , Rajat Sen

On the Optimality of Averaging in Distributed Statistical Learning

A common approach to statistical learning with big-data is to randomly split it among $m$ machines and learn the parameter of interest by averaging the $m$ individual estimates. In this paper, focusing on empirical risk minimization, or…

Machine Learning · Statistics 2016-06-14 Jonathan Rosenblatt , Boaz Nadler

Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data

Data used to train machine learning models can be adversarial--maliciously constructed by adversaries to fool the model. Challenge also arises by privacy, confidentiality, or due to legal constraints when data are geographically gathered…

Machine Learning · Computer Science 2020-07-09 Alireza Sadeghi , Gang Wang , Meng Ma , Georgios B. Giannakis

Learning from networked examples

Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may…

Artificial Intelligence · Computer Science 2017-06-06 Yuyi Wang , Jan Ramon , Zheng-Chu Guo

The Stochastic Replica Approach to Machine Learning: Stability and Parameter Optimization

We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as…

Machine Learning · Statistics 2018-11-19 Patrick Chao , Tahereh Mazaheri , Bo Sun , Nicholas B. Weingartner , Zohar Nussinov

Incremental Learning-to-Learn with Statistical Guarantees

In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta distribution. In contrast to previous work on batch learning-to-learn, we consider a scenario where tasks are…

Machine Learning · Statistics 2018-03-23 Giulia Denevi , Carlo Ciliberto , Dimitris Stamos , Massimiliano Pontil

Learning Models with Uniform Performance via Distributionally Robust Optimization

A common goal in statistics and machine learning is to learn models that can perform well against distributional shifts, such as latent heterogeneous subpopulations, unknown covariate shifts, or unmodeled temporal effects. We develop and…

Machine Learning · Statistics 2020-07-21 John Duchi , Hongseok Namkoong