Related papers: Enhancing Self-Training Methods

Self-Training: A Survey

Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations. Because this framework is relevant in many applications, they have received a lot of interest…

Machine Learning · Computer Science 2025-02-17 Massih-Reza Amini , Vasilii Feofanov , Loic Pauletto , Lies Hadjadj , Emilie Devijver , Yury Maximov

In all LikelihoodS: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning

Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these…

Machine Learning · Statistics 2023-03-03 Julian Rodemann , Christoph Jansen , Georg Schollmeyer , Thomas Augustin

Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithm generates pseudo-labels for the unlabeled examples and progressively refines…

Machine Learning · Computer Science 2020-06-22 Samet Oymak , Talha Cihad Gulcu

Debiased Self-Training for Semi-Supervised Learning

Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. Yet these datasets are time-consuming and labor-exhaustive to obtain on realistic tasks. To mitigate the requirement…

Machine Learning · Computer Science 2022-11-10 Baixu Chen , Junguang Jiang , Ximei Wang , Pengfei Wan , Jianmin Wang , Mingsheng Long

Revisiting Self-Training with Regularized Pseudo-Labeling for Tabular Data

Recent progress in semi- and self-supervised learning has caused a rift in the long-held belief about the need for an enormous amount of labeled data for machine learning and the irrelevancy of unlabeled data. Although it has been…

Machine Learning · Computer Science 2023-03-14 Minwook Kim , Juseong Kim , Giltae Song

Self-training Improves Pre-training for Natural Language Understanding

Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a…

Computation and Language · Computer Science 2020-10-06 Jingfei Du , Edouard Grave , Beliz Gunel , Vishrav Chaudhary , Onur Celebi , Michael Auli , Ves Stoyanov , Alexis Conneau

Self Training with Ensemble of Teacher Models

In order to train robust deep learning models, large amounts of labelled data is required. However, in the absence of such large repositories of labelled data, unlabeled data can be exploited for the same. Semi-Supervised learning aims to…

Machine Learning · Computer Science 2021-07-20 Soumyadeep Ghosh , Sanjay Kumar , Janu Verma , Awanish Kumar

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the…

Machine Learning · Computer Science 2022-02-15 Shuai Zhang , Meng Wang , Sijia Liu , Pin-Yu Chen , Jinjun Xiong

Credal Self-Supervised Learning

Self-training is an effective approach to semi-supervised learning. The key idea is to let the learner itself iteratively generate "pseudo-supervision" for unlabeled instances based on its current hypothesis. In combination with consistency…

Machine Learning · Statistics 2021-11-05 Julian Lienen , Eyke Hüllermeier

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning

Semi-supervised learning, i.e. jointly learning from labeled and unlabeled samples, is an active research topic due to its key role on relaxing human supervision. In the context of image classification, recent advances to learn from…

Computer Vision and Pattern Recognition · Computer Science 2020-06-30 Eric Arazo , Diego Ortego , Paul Albert , Noel E. O'Connor , Kevin McGuinness

Doubly Robust Self-Training

Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of…

Machine Learning · Computer Science 2023-11-06 Banghua Zhu , Mingyu Ding , Philip Jacobson , Ming Wu , Wei Zhan , Michael Jordan , Jiantao Jiao

Revisiting Self-Training for Neural Sequence Generation

Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). While self-training…

Machine Learning · Computer Science 2020-10-20 Junxian He , Jiatao Gu , Jiajun Shen , Marc'Aurelio Ranzato

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques…

Machine Learning · Computer Science 2011-09-12 N. V. Chawla , Grigoris Karakoulas

Feedback-Driven Pseudo-Label Reliability Assessment: Redefining Thresholding for Semi-Supervised Semantic Segmentation

Semi-supervised learning leverages unlabeled data to enhance model performance, addressing the limitations of fully supervised approaches. Among its strategies, pseudo-supervision has proven highly effective, typically relying on one or…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Negin Ghamsarian , Sahar Nasirihaghighi , Klaus Schoeffmann , Raphael Sznitman

Uncertainty-aware Self-training for Text Classification with Few Labels

Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of…

Computation and Language · Computer Science 2020-06-30 Subhabrata Mukherjee , Ahmed Hassan Awadallah

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

We propose a semi-supervised text classifier based on self-training using one positive and one negative property of neural networks. One of the weaknesses of self-training is the semantic drift problem, where noisy pseudo-labels accumulate…

Computation and Language · Computer Science 2024-01-02 Payam Karisani

Boosting Supervised Learning Performance with Co-training

Deep learning perception models require a massive amount of labeled training data to achieve good performance. While unlabeled data is easy to acquire, the cost of labeling is prohibitive and could create a tremendous burden on companies or…

Computer Vision and Pattern Recognition · Computer Science 2021-11-19 Xinnan Du , William Zhang , Jose M. Alvarez

Conformal Credal Self-Supervised Learning

In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art…

Machine Learning · Statistics 2023-06-12 Julian Lienen , Caglar Demir , Eyke Hüllermeier

Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach

Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Haibo Jin , Haoxuan Che , Hao Chen