Related papers: Unsupervised Risk Estimation Using Only Conditiona…

Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels

Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel…

Machine Learning · Computer Science 2010-07-23 Krishnakumar Balasubramanian , Pinar Donmez , Guy Lebanon

Training on Test Data with Bayesian Adaptation for Covariate Shift

When faced with distribution shift at test time, deep neural networks often make inaccurate predictions with unreliable uncertainty estimates. While improving the robustness of neural networks is one promising approach to mitigate this…

Machine Learning · Computer Science 2021-09-28 Aurick Zhou , Sergey Levine

Unsupervised Ensemble Learning with Dependent Classifiers

In unsupervised ensemble learning, one obtains predictions from multiple sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to assess it. The task is to combine these possibly…

Machine Learning · Computer Science 2016-02-24 Ariel Jaffe , Ethan Fetaya , Boaz Nadler , Tingting Jiang , Yuval Kluger

On Different Notions of Redundancy in Conditional-Independence-Based Discovery of Graphical Models

Conditional-independence-based discovery uses statistical tests to identify a graphical model that represents the independence structure of variables in a dataset. These tests, however, can be unreliable, and algorithms are sensitive to…

Machine Learning · Computer Science 2026-04-21 Philipp M. Faller , Dominik Janzing

Unlearning Evaluation through Subset Statistical Independence

Evaluating machine unlearning remains challenging, as existing methods typically require retraining reference models or performing membership inference attacks, both of which rely on prior access to training configuration or supervision…

Machine Learning · Computer Science 2026-03-03 Chenhao Zhang , Muxing Li , Feng Liu , Weitong Chen , Miao Xu

Learning the Structure of Generative Models without Labeled Data

Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's…

Machine Learning · Computer Science 2017-09-12 Stephen H. Bach , Bryan He , Alexander Ratner , Christopher Ré

Model Specification Test with Unlabeled Data: Approach from Covariate Shift

We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is…

Methodology · Statistics 2020-02-25 Masahiro Kato , Hikaru Kawarazaki

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Semi-Supervised Empirical Risk Minimization: Using unlabeled data to improve prediction

We present a general methodology for using unlabeled data to design semi supervised learning (SSL) variants of the Empirical Risk Minimization (ERM) learning process. Focusing on generalized linear regression, we analyze of the…

Machine Learning · Statistics 2022-03-08 Oren Yuval , Saharon Rosset

Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation

Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled…

Machine Learning · Computer Science 2021-03-05 Mayee F. Chen , Benjamin Cohen-Wang , Stephen Mussmann , Frederic Sala , Christopher Ré

Dependable Exploitation of High-Dimensional Unlabeled Data in an Assumption-Lean Framework

Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data. In this paper, we examine the statistical inference problem in an…

Methodology · Statistics 2026-03-31 Chao Ying , Siyi Deng , Yang Ning , Jiwei Zhao , Heping Zhang

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U)…

Machine Learning · Statistics 2019-03-13 Nan Lu , Gang Niu , Aditya Krishna Menon , Masashi Sugiyama

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

When a deep learning model is deployed in the wild, it can encounter test data drawn from distributions different from the training data distribution and suffer drop in performance. For safe deployment, it is essential to estimate the…

Machine Learning · Computer Science 2023-05-16 Jiefeng Chen , Frederick Liu , Besim Avci , Xi Wu , Yingyu Liang , Somesh Jha

Likelihood-based semi-supervised model selection with applications to speech processing

In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some…

Machine Learning · Statistics 2011-08-25 Christopher M. White , Sanjeev P. Khudanpur , Patrick J. Wolfe

Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach

We propose an efficient method to estimate the accuracy of classifiers using only unlabeled data. We consider a setting with multiple classification problems where the target classes may be tied together through logical constraints. For…

Machine Learning · Computer Science 2017-05-22 Emmanouil A. Platanios , Hoifung Poon , Tom M. Mitchell , Eric Horvitz

Estimation of prediction error with known covariate shift

In supervised learning, the estimation of prediction error on unlabeled test data is an important task. Existing methods are usually built on the assumption that the training and test data are sampled from the same distribution, which is…

Methodology · Statistics 2022-09-30 Hui Xu , Robert Tibshirani

Unsupervised Sequence Classification using Sequential Output Statistics

We consider learning a sequence classifier without labeled data by using sequential output statistics. The problem is highly valuable since obtaining labels in training data is often costly, while the sequential output statistics (e.g.,…

Machine Learning · Computer Science 2017-05-30 Yu Liu , Jianshu Chen , Li Deng

Non-Asymptotic Performance of Social Machine Learning Under Limited Data

This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the…

Machine Learning · Computer Science 2024-07-10 Ping Hu , Virginia Bordignon , Mert Kayaalp , Ali H. Sayed

Multi-Complementary and Unlabeled Learning for Arbitrary Losses and Models

A weakly-supervised learning framework named as complementary-label learning has been proposed recently, where each sample is equipped with a single complementary label that denotes one of the classes the sample does not belong to. However,…

Machine Learning · Statistics 2020-07-24 Yuzhou Cao , Shuqi Liu , Yitian Xu

Testing Conditional Independence in Supervised Learning Algorithms

We propose the conditional predictive impact (CPI), a consistent and unbiased estimator of the association between one or several features and a given outcome, conditional on a reduced feature set. Building on the knockoff framework of…

Methodology · Statistics 2021-05-14 David S. Watson , Marvin N. Wright