Related papers: SAT: Improving Semi-Supervised Text Classification…

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

Active learning is an iterative labeling process that is used to obtain a small labeled subset, despite the absence of labeled data, thereby enabling to train a model for supervised tasks such as text classification. While active learning…

Computation and Language · Computer Science 2024-10-07 Christopher Schröder , Gerhard Heyer

Incremental Self-training for Semi-supervised Learning

Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have…

Machine Learning · Computer Science 2024-04-22 Jifeng Guo , Zhulin Liu , Tong Zhang , C. L. Philip Chen

Enhancing Self-Training Methods

Semi-supervised learning approaches train on small sets of labeled data along with large sets of unlabeled data. Self-training is a semi-supervised teacher-student approach that often suffers from the problem of "confirmation bias" that…

Machine Learning · Computer Science 2023-01-19 Aswathnarayan Radhakrishnan , Jim Davis , Zachary Rabin , Benjamin Lewis , Matthew Scherreik , Roman Ilin

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

Self-training provides an effective means of using an extremely small amount of labeled data to create pseudo-labels for unlabeled data. Many state-of-the-art self-training approaches hinge on different regularization methods to prevent…

Computation and Language · Computer Science 2022-02-08 Hazel Kim , Jaeman Son , Yo-Sub Han

Self-training Improves Pre-training for Natural Language Understanding

Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a…

Computation and Language · Computer Science 2020-10-06 Jingfei Du , Edouard Grave , Beliz Gunel , Vishrav Chaudhary , Onur Celebi , Michael Auli , Ves Stoyanov , Alexis Conneau

SST: Self-training with Self-adaptive Thresholding for Semi-supervised Learning

Neural networks have demonstrated exceptional performance in supervised learning, benefiting from abundant high-quality annotated data. However, obtaining such data in real-world scenarios is costly and labor-intensive. Semi-supervised…

Machine Learning · Computer Science 2025-06-03 Shuai Zhao , Heyan Huang , Xinge Li , Xiaokang Chen , Rui Wang

Credal Self-Supervised Learning

Self-training is an effective approach to semi-supervised learning. The key idea is to let the learner itself iteratively generate "pseudo-supervision" for unlabeled instances based on its current hypothesis. In combination with consistency…

Machine Learning · Statistics 2021-11-05 Julian Lienen , Eyke Hüllermeier

Meta Co-Training: Two Views are Better than One

In many critical computer vision scenarios unlabeled data is plentiful, but labels are scarce and difficult to obtain. As a result, semi-supervised learning which leverages unlabeled data to boost the performance of supervised classifiers…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Jay C. Rothenberger , Dimitrios I. Diochnos

Semi-Supervised Text Classification via Self-Pretraining

We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update…

Computation and Language · Computer Science 2021-10-01 Payam Karisani , Negin Karisani

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Self-training via pseudo labeling is a conventional, simple, and popular pipeline to leverage unlabeled data. In this work, we first construct a strong baseline of self-training (namely ST) for semi-supervised semantic segmentation via…

Computer Vision and Pattern Recognition · Computer Science 2022-03-04 Lihe Yang , Wei Zhuo , Lei Qi , Yinghuan Shi , Yang Gao

Rethinking Semi-supervised Learning with Language Models

Semi-supervised learning (SSL) is a popular setting aiming to effectively utilize unlabelled data to improve model performance in downstream natural language processing (NLP) tasks. Currently, there are two popular approaches to make use of…

Computation and Language · Computer Science 2023-05-23 Zhengxiang Shi , Francesco Tonolini , Nikolaos Aletras , Emine Yilmaz , Gabriella Kazai , Yunlong Jiao

Self-supervised Label Augmentation via Input Transformations

Self-supervised learning, which learns by constructing artificial labels given only the input signals, has recently gained considerable attention for learning representations with unlabeled datasets, i.e., learning without any…

Machine Learning · Computer Science 2020-06-30 Hankook Lee , Sung Ju Hwang , Jinwoo Shin

Uncertainty-aware Self-training for Text Classification with Few Labels

Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of…

Computation and Language · Computer Science 2020-06-30 Subhabrata Mukherjee , Ahmed Hassan Awadallah

Conformal Credal Self-Supervised Learning

In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art…

Machine Learning · Statistics 2023-06-12 Julian Lienen , Caglar Demir , Eyke Hüllermeier

Barely-Supervised Learning: Semi-Supervised Learning with very few labeled images

This paper tackles the problem of semi-supervised learning when the set of labeled samples is limited to a small number of images per class, typically less than 10, problem that we refer to as barely-supervised learning. We analyze in depth…

Computer Vision and Pattern Recognition · Computer Science 2021-12-23 Thomas Lucas , Philippe Weinzaepfel , Gregory Rogez

SCAT: Robust Self-supervised Contrastive Learning via Adversarial Training for Text Classification

Despite their promising performance across various natural language processing (NLP) tasks, current NLP systems are vulnerable to textual adversarial attacks. To defend against these attacks, most existing methods apply adversarial training…

Computation and Language · Computer Science 2023-07-06 Junjie Wu , Dit-Yan Yeung

Test-Time Adaptation via Self-Training with Nearest Neighbor Information

Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier's…

Computer Vision and Pattern Recognition · Computer Science 2023-03-01 Minguk Jang , Sae-Young Chung , Hye Won Chung

The Role of Pseudo-labels in Self-training Linear Classifiers on High-dimensional Gaussian Mixture Data

Self-training (ST) is a simple yet effective semi-supervised learning method. However, why and how ST improves generalization performance by using potentially erroneous pseudo-labels is still not well understood. To deepen the understanding…

Machine Learning · Statistics 2024-05-08 Takashi Takahashi

Rank-Aware Negative Training for Semi-Supervised Text Classification

Semi-supervised text classification-based paradigms (SSTC) typically employ the spirit of self-training. The key idea is to train a deep classifier on limited labeled texts and then iteratively predict the unlabeled texts as their…

Computation and Language · Computer Science 2023-06-14 Ahmed Murtadha , Shengfeng Pan , Wen Bo , Jianlin Su , Xinxin Cao , Wenze Zhang , Yunfeng Liu

Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithm generates pseudo-labels for the unlabeled examples and progressively refines…

Machine Learning · Computer Science 2020-06-22 Samet Oymak , Talha Cihad Gulcu