Related papers: Labels, Information, and Computation: Efficient Le…

Unsupervised Selective Labeling for More Effective Semi-Supervised Learning

Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right…

Machine Learning · Computer Science 2023-08-24 Xudong Wang , Long Lian , Stella X. Yu

Reliable Semi-Supervised Learning when Labels are Missing at Random

Semi-supervised learning methods are motivated by the availability of large datasets with unlabeled features in addition to labeled data. Unlabeled data is, however, not guaranteed to improve classification performance and has in fact been…

Machine Learning · Statistics 2019-10-25 Xiuming Liu , Dave Zachariah , Johan Wågberg , Thomas B. Schön

Why the pseudo label based semi-supervised learning algorithm is effective?

Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo…

Machine Learning · Computer Science 2023-01-26 Zeping Min , Qian Ge , Cheng Tai

Learning from Complementary Labels

Collecting labeled data is costly and thus a critical bottleneck in real-world classification tasks. To mitigate this problem, we propose a novel setting, namely learning from complementary labels for multi-class classification. A…

Machine Learning · Statistics 2017-11-15 Takashi Ishida , Gang Niu , Weihua Hu , Masashi Sugiyama

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

How to Achieve High Classification Accuracy with Just a Few Labels: A Semi-supervised Approach Using Sampled Packets

Network traffic classification, which has numerous applications from security to billing and network provisioning, has become a cornerstone of today's computer networks. Previous studies have developed traffic classification techniques…

Networking and Internet Architecture · Computer Science 2020-05-19 Shahbaz Rezaei , Xin Liu

Label Efficient Learning by Exploiting Multi-class Output Codes

We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes. Rather than studying the behavior of these techniques for supervised learning, we establish a connection between…

Machine Learning · Computer Science 2016-11-28 Maria Florina Balcan , Travis Dick , Yishay Mansour

Rethinking the Value of Labels for Improving Class-Imbalanced Learning

Real-world data often exhibits long-tailed distributions with heavy class imbalance, posing great challenges for deep recognition models. We identify a persisting dilemma on the value of labels in the context of imbalanced learning: on the…

Machine Learning · Computer Science 2020-09-29 Yuzhe Yang , Zhi Xu

On the Informativeness of Supervision Signals

Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are…

Machine Learning · Computer Science 2023-07-06 Ilia Sucholutsky , Ruairidh M. Battleday , Katherine M. Collins , Raja Marjieh , Joshua C. Peterson , Pulkit Singh , Umang Bhatt , Nori Jacoby , Adrian Weller , Thomas L. Griffiths

Self-semi-supervised Learning to Learn from NoisyLabeled Data

The remarkable success of today's deep neural networks highly depends on a massive number of correctly labeled data. However, it is rather costly to obtain high-quality human-labeled data, leading to the active research area of training…

Machine Learning · Computer Science 2020-11-04 Jiacheng Wang , Yue Ma , Shuang Gao

Iterative Label Improvement: Robust Training by Confidence Based Filtering and Dataset Partitioning

State-of-the-art, high capacity deep neural networks not only require large amounts of labelled training data, they are also highly susceptible to label errors in this data, typically resulting in large efforts and costs and therefore…

Machine Learning · Computer Science 2020-07-20 Christian Haase-Schütz , Rainer Stal , Heinz Hertlein , Bernhard Sick

Informative missingness and its implications in semi-supervised learning

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance…

Machine Learning · Statistics 2025-12-29 Jinran Wu , You-Gan Wang , Geoffrey J. McLachlan

Multi-class Classification from Multiple Unlabeled Datasets with Partial Risk Regularization

Recent years have witnessed a great success of supervised deep learning, where predictive models were trained from a large amount of fully labeled data. However, in practice, labeling such big data can be very costly and may not even be…

Machine Learning · Computer Science 2022-10-18 Yuting Tang , Nan Lu , Tianyi Zhang , Masashi Sugiyama

Impact of Strategic Sampling and Supervision Policies on Semi-supervised Learning

In semi-supervised representation learning frameworks, when the number of labelled data is very scarce, the quality and representativeness of these samples become increasingly important. Existing literature on semi-supervised learning…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Shuvendu Roy , Ali Etemad

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets…

Machine Learning · Computer Science 2018-11-16 Matthew Klawonn , Eric Heim , James Hendler

Learning to Learn in a Semi-Supervised Fashion

To address semi-supervised learning from both labeled and unlabeled data, we present a novel meta-learning scheme. We particularly consider that labeled and unlabeled data share disjoint ground truth label sets, which can be seen tasks like…

Computer Vision and Pattern Recognition · Computer Science 2020-08-26 Yun-Chun Chen , Chao-Te Chou , Yu-Chiang Frank Wang

Cost-minimising strategies for data labelling : optimal stopping and active learning

Supervised learning deals with the inference of a distribution over an output or label space $\CY$ conditioned on points in an observation space $\CX$, given a training dataset $D$ of pairs in $\CX \times \CY$. However, in a lot of…

Machine Learning · Computer Science 2007-11-15 Christos Dimitrakakis , Christian Savu-Krohn

Efficient Human Computation

Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the…

Machine Learning · Computer Science 2009-03-09 Ran Gilad-Bachrach , Aharon Bar-Hillel , Liat Ein-Dor

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Active Learning: Problem Settings and Recent Developments

In supervised learning, acquiring labeled training data for a predictive model can be very costly, but acquiring a large amount of unlabeled data is often quite easy. Active learning is a method of obtaining predictive models with high…

Machine Learning · Computer Science 2020-12-17 Hideitsu Hino