Related papers: Instance Optimal Learning

Differentially Private Sampling from Distributions

We initiate an investigation of private sampling from distributions. Given a dataset with $n$ independent observations from an unknown distribution $P$, a sampling algorithm must output a single observation from a distribution that is close…

Machine Learning · Computer Science 2022-11-16 Sofya Raskhodnikova , Satchit Sivakumar , Adam Smith , Marika Swanberg

Estimating Learnability in the Sublinear Data Regime

We consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data. We show that it is often possible to accurately estimate this "learnability" even when given an amount of data that is too…

Machine Learning · Computer Science 2019-03-26 Weihao Kong , Gregory Valiant

Instance-Dependent PU Learning by Bayesian Optimal Relabeling

When learning from positive and unlabelled data, it is a strong assumption that the positive observations are randomly sampled from the distribution of $X$ conditional on $Y = 1$, where X stands for the feature and Y the label. Most…

Machine Learning · Computer Science 2020-03-04 Fengxiang He , Tongliang Liu , Geoffrey I Webb , Dacheng Tao

Learning from Noisy Label Distributions

In this paper, we consider a novel machine learning problem, that is, learning a classifier from noisy label distributions. In this problem, each instance with a feature vector belongs to at least one group. Then, instead of the true label…

Machine Learning · Computer Science 2017-08-17 Yuya Yoshikawa

Learning-based Statistical Refinement for Denoising

This work proposes a learning-based statistical refinement method for improving the denoising results of a given denoiser without knowing the precise noise distribution or accessing clean images or calibration data. While there are many…

Machine Learning · Computer Science 2026-05-07 Rihuan Ke

Towards Arbitrary Noise Augmentation - Deep Learning for Sampling from Arbitrary Probability Distributions

Accurate noise modelling is important for training of deep learning reconstruction algorithms. While noise models are well known for traditional imaging techniques, the noise distribution of a novel sensor may be difficult to determine a…

Machine Learning · Computer Science 2018-07-11 Felix Horger , Tobias Würfl , Vincent Christlein , Andreas Maier

Pattern Recognition for Conditionally Independent Data

In this work we consider the task of relaxing the i.i.d assumption in pattern recognition (or classification), aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete…

Machine Learning · Computer Science 2012-02-28 Daniil Ryabko

Probabilistic Decoupling of Labels in Classification

In this paper we develop a principled, probabilistic, unified approach to non-standard classification tasks, such as semi-supervised, positive-unlabelled, multi-positive-unlabelled and noisy-label learning. We train a classifier on the…

Machine Learning · Computer Science 2020-06-17 Jeppe Nørregaard , Lars Kai Hansen

Learning Discrete Distributions from Untrusted Batches

We consider the problem of learning a discrete distribution in the presence of an $\epsilon$ fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source…

Machine Learning · Computer Science 2017-11-23 Mingda Qiao , Gregory Valiant

Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed,…

Machine Learning · Computer Science 2012-02-20 Barnabas Poczos , Liang Xiong , Jeff Schneider

Efficient Data-Dependent Learnability

The predictive normalized maximum likelihood (pNML) approach has recently been proposed as the min-max optimal solution to the batch learning problem where both the training set and the test data feature are individuals, known sequences.…

Machine Learning · Computer Science 2020-11-23 Yaniv Fogel , Tal Shapira , Meir Feder

Domain Adaptation for Statistical Classifiers

The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution…

Machine Learning · Computer Science 2011-09-30 H. Daume , D. Marcu

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating…

Machine Learning · Computer Science 2024-07-01 Vitaly Feldman , Audra McMillan , Satchit Sivakumar , Kunal Talwar

Instance-Dependent Partial Label Learning

Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect…

Machine Learning · Computer Science 2021-10-27 Ning Xu , Congyu Qiao , Xin Geng , Min-Ling Zhang

Detecting Out-of-Distribution Samples via Conditional Distribution Entropy with Optimal Transport

When deploying a trained machine learning model in the real world, it is inevitable to receive inputs from out-of-distribution (OOD) sources. For instance, in continual learning settings, it is common to encounter OOD samples due to the…

Machine Learning · Computer Science 2024-01-23 Chuanwen Feng , Wenlong Chen , Ao Ke , Yilong Ren , Xike Xie , S. Kevin Zhou

IRL with Partial Observations using the Principle of Uncertain Maximum Entropy

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world…

Machine Learning · Computer Science 2022-08-16 Kenneth Bogert , Yikang Gui , Prashant Doshi

Optimal Algorithms for Augmented Testing of Discrete Distributions

We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution $p$, extensive research has established optimal bounds for uniformity testing,…

Machine Learning · Computer Science 2024-12-03 Maryam Aliakbarpour , Piotr Indyk , Ronitt Rubinfeld , Sandeep Silwal

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the…

Machine Learning · Computer Science 2019-03-26 Giulia Denevi , Carlo Ciliberto , Riccardo Grazzi , Massimiliano Pontil

Optimal Testing for Properties of Distributions

Given samples from an unknown distribution $p$, is it possible to distinguish whether $p$ belongs to some class of distributions $\mathcal{C}$ versus $p$ being far from every distribution in $\mathcal{C}$? This fundamental question has…

Data Structures and Algorithms · Computer Science 2015-12-09 Jayadev Acharya , Constantinos Daskalakis , Gautam Kamath

An Improved Algorithm for Learning Drifting Discrete Distributions

We present a new adaptive algorithm for learning discrete distributions under distribution drift. In this setting, we observe a sequence of independent samples from a discrete distribution that is changing over time, and the goal is to…

Machine Learning · Computer Science 2024-03-11 Alessio Mazzetto