Related papers: Label-Efficient Self-Training for Attribute Extrac…

Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

Active learning is an iterative labeling process that is used to obtain a small labeled subset, despite the absence of labeled data, thereby enabling to train a model for supervised tasks such as text classification. While active learning…

Computation and Language · Computer Science 2024-10-07 Christopher Schröder , Gerhard Heyer

LST: Lexicon-Guided Self-Training for Few-Shot Text Classification

Self-training provides an effective means of using an extremely small amount of labeled data to create pseudo-labels for unlabeled data. Many state-of-the-art self-training approaches hinge on different regularization methods to prevent…

Computation and Language · Computer Science 2022-02-08 Hazel Kim , Jaeman Son , Yo-Sub Han

Noise-Aware Training of Layout-Aware Language Models

A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target…

Computation and Language · Computer Science 2024-04-02 Ritesh Sarkhel , Xiaoqi Ren , Lauro Beltrao Costa , Guolong Su , Vincent Perot , Yanan Xie , Emmanouil Koukoumidis , Arnab Nandi

Label-Specific Training Set Construction from Web Resource for Image Annotation

Recently many research efforts have been devoted to image annotation by leveraging on the associated tags/keywords of web images as training labels. A key issue to resolve is the relatively low accuracy of the tags. In this paper, we…

Multimedia · Computer Science 2011-07-15 Jinhui Tang , Shuicheng Yan , Tat-Seng Chua , Ramesh Jain

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

The web contains countless semi-structured websites, which can be a rich source of information for populating knowledge bases. Existing methods for extracting relations from the DOM trees of semi-structured webpages can achieve high…

Artificial Intelligence · Computer Science 2018-04-13 Colin Lockard , Xin Luna Dong , Arash Einolghozati , Prashant Shiralkar

Incremental Self-training for Semi-supervised Learning

Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have…

Machine Learning · Computer Science 2024-04-22 Jifeng Guo , Zhulin Liu , Tong Zhang , C. L. Philip Chen

Uncertainty-aware Self-training for Text Classification with Few Labels

Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of…

Computation and Language · Computer Science 2020-06-30 Subhabrata Mukherjee , Ahmed Hassan Awadallah

Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting…

Computation and Language · Computer Science 2021-02-24 Xinyang Zhang , Chenwei Zhang , Luna Xin Dong , Jingbo Shang , Jiawei Han

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Self-training via pseudo labeling is a conventional, simple, and popular pipeline to leverage unlabeled data. In this work, we first construct a strong baseline of self-training (namely ST) for semi-supervised semantic segmentation via…

Computer Vision and Pattern Recognition · Computer Science 2022-03-04 Lihe Yang , Wei Zhuo , Lei Qi , Yinghuan Shi , Yang Gao

Semi-supervised Relation Extraction via Incremental Meta Self-Training

To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the…

Computation and Language · Computer Science 2021-09-13 Xuming Hu , Chenwei Zhang , Fukun Ma , Chenyao Liu , Lijie Wen , Philip S. Yu

Unsupervised Selective Labeling for More Effective Semi-Supervised Learning

Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right…

Machine Learning · Computer Science 2023-08-24 Xudong Wang , Long Lian , Stella X. Yu

Reliable Label Bootstrapping for Semi-Supervised Learning

Reducing the amount of labels required to train convolutional neural networks without performance degradation is key to effectively reduce human annotation efforts. We propose Reliable Label Bootstrapping (ReLaB), an unsupervised…

Computer Vision and Pattern Recognition · Computer Science 2021-02-26 Paul Albert , Diego Ortego , Eric Arazo , Noel E. O'Connor , Kevin McGuinness

Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation

Semi-supervised learning (SSL) can reduce the need for large labelled datasets by incorporating unlabelled data into the training. This is particularly interesting for semantic segmentation, where labelling data is very costly and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-20 Sebastian Scherer , Robin Schön , Rainer Lienhart

Self-Training with Weak Supervision

State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such…

Computation and Language · Computer Science 2021-04-13 Giannis Karamanolakis , Subhabrata Mukherjee , Guoqing Zheng , Ahmed Hassan Awadallah

Adaptive Self-training for Few-shot Neural Sequence Labeling

Sequence labeling is an important technique employed for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), slot tagging for dialog systems and semantic parsing. Large-scale pre-trained language models…

Computation and Language · Computer Science 2020-12-14 Yaqing Wang , Subhabrata Mukherjee , Haoda Chu , Yuancheng Tu , Ming Wu , Jing Gao , Ahmed Hassan Awadallah

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation

Recent advances in semi-supervised learning (SSL) demonstrate that a combination of consistency regularization and pseudo-labeling can effectively improve image classification accuracy in the low-data regime. Compared to classification,…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Yuliang Zou , Zizhao Zhang , Han Zhang , Chun-Liang Li , Xiao Bian , Jia-Bin Huang , Tomas Pfister

SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning

Semi-structured content in HTML tables, lists, and infoboxes accounts for a substantial share of factual data on the web, yet the formatting complicates usage, and reliably extracting structured information from them remains challenging.…

Computation and Language · Computer Science 2025-10-03 Shicheng Liu , Kai Sun , Lisheng Fu , Xilun Chen , Xinyuan Zhang , Zhaojiang Lin , Rulin Shao , Yue Liu , Anuj Kumar , Wen-tau Yih , Xin Luna Dong

Learning to Self-Train for Semi-Supervised Few-Shot Classification

Few-shot classification (FSC) is challenging due to the scarcity of labeled training data (e.g. only one labeled data point per class). Meta-learning has shown to achieve promising results by learning to initialize a classification model…

Computer Vision and Pattern Recognition · Computer Science 2019-10-01 Xinzhe Li , Qianru Sun , Yaoyao Liu , Shibao Zheng , Qin Zhou , Tat-Seng Chua , Bernt Schiele

Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling

Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in…

Machine Learning · Computer Science 2026-05-25 Yeseul Cho , Baekrok Shin , Changmin Kang , Chulhee Yun

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

We propose a semi-supervised text classifier based on self-training using one positive and one negative property of neural networks. One of the weaknesses of self-training is the semantic drift problem, where noisy pseudo-labels accumulate…

Computation and Language · Computer Science 2024-01-02 Payam Karisani