Related papers: A Benchmark Generative Probabilistic Model for Wea…

AutoWS: Automated Weak Supervision Framework for Text Classification

Creating large, good quality labeled data has become one of the major bottlenecks for developing machine learning applications. Multiple techniques have been developed to either decrease the dependence of labeled data (zero/few-shot…

Computation and Language · Computer Science 2023-02-08 Abhinav Bohra , Huy Nguyen , Devashish Khatwani

Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

Fine-tuning vision-language models (VLMs) with abundant unlabeled data recently has attracted increasing attention. Existing methods that resort to the pseudolabeling strategy would suffer from heavily incorrect hard pseudolabels when VLMs…

Machine Learning · Computer Science 2024-06-18 Jiahan Zhang , Qi Wei , Feng Liu , Lei Feng

SepLL: Separating Latent Class Labels from Weak Supervision Noise

In the weakly supervised learning paradigm, labeling functions automatically assign heuristic, often noisy, labels to data samples. In this work, we provide a method for learning from weak labels by separating two types of complementary…

Machine Learning · Computer Science 2022-10-26 Andreas Stephan , Vasiliki Kougia , Benjamin Roth

Label Augmentation with Reinforced Labeling for Weak Supervision

Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs)…

Machine Learning · Computer Science 2022-04-14 Gürkan Solmaz , Flavio Cirillo , Fabio Maresca , Anagha Gode Anil Kumar

Generalized Weak Supervision for Neural Information Retrieval

Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this…

Information Retrieval · Computer Science 2023-04-19 Yen-Chieh Lien , Hamed Zamani , W. Bruce Croft

Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels

Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its spontaneous, unstructured, and disfluent nature.…

Computation and Language · Computer Science 2025-11-18 Sourya Dipta Das , Shubham Kumar , Kuldeep Yadav

Language Models in the Loop: Incorporating Prompting into Weak Supervision

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for…

Machine Learning · Computer Science 2022-05-06 Ryan Smith , Jason A. Fries , Braden Hancock , Stephen H. Bach

Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that…

Machine Learning · Statistics 2025-08-07 Verónica Álvarez , Santiago Mazuelas , Steven An , Sanjoy Dasgupta

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

Localizing keypoints of an object is a basic visual problem. However, supervised learning of a keypoint localization network often requires a large amount of data, which is expensive and time-consuming to obtain. To remedy this, there is an…

Computer Vision and Pattern Recognition · Computer Science 2022-01-25 Can Wang , Sheng Jin , Yingda Guan , Wentao Liu , Chen Qian , Ping Luo , Wanli Ouyang

Scene Graph Prediction with Limited Labels

Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to…

Computer Vision and Pattern Recognition · Computer Science 2019-12-03 Vincent S. Chen , Paroma Varma , Ranjay Krishna , Michael Bernstein , Christopher Re , Li Fei-Fei

Snorkel: Rapid Training Data Creation with Weak Supervision

Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data.…

Machine Learning · Computer Science 2017-11-29 Alexander Ratner , Stephen H. Bach , Henry Ehrenberg , Jason Fries , Sen Wu , Christopher Ré

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to…

Machine Learning · Computer Science 2021-04-20 Mamshad Nayeem Rizve , Kevin Duarte , Yogesh S Rawat , Mubarak Shah

PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label Semi-Supervised Classification

While much of recent study in semi-supervised learning (SSL) has achieved strong performance on single-label classification problems, an equally important yet underexplored problem is how to leverage the advantage of unlabeled data in…

Computer Vision and Pattern Recognition · Computer Science 2022-08-31 Junxiang Huang , Alexander Huang , Beatriz C. Guerra , Yen-Yun Yu

Instance-Dependent Partial Label Learning

Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect…

Machine Learning · Computer Science 2021-10-27 Ning Xu , Congyu Qiao , Xin Geng , Min-Ling Zhang

Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels

Computational social science (CSS) practitioners often rely on human-labeled data to fine-tune supervised text classifiers. We assess the potential for researchers to augment or replace human-generated training data with surrogate training…

Computation and Language · Computer Science 2024-06-26 Nicholas Pangakis , Samuel Wolken

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation

Recent advances in semi-supervised learning (SSL) demonstrate that a combination of consistency regularization and pseudo-labeling can effectively improve image classification accuracy in the low-data regime. Compared to classification,…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Yuliang Zou , Zizhao Zhang , Han Zhang , Chun-Liang Li , Xiao Bian , Jia-Bin Huang , Tomas Pfister

Semi-supervised Concept Bottleneck Models

Concept Bottleneck Models (CBMs) have garnered increasing attention due to their ability to provide concept-based explanations for black-box deep learning models while achieving high final prediction accuracy using human-like concepts.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Lijie Hu , Tianhao Huang , Huanyi Xie , Xilin Gong , Chenyang Ren , Zhengyu Hu , Lu Yu , Ping Ma , Di Wang

A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification

Semi-supervised learning (SSL) is a practical challenge in computer vision. Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL. These approaches employ a threshold-to-pseudo-label…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Jiaqi Wu , Junbiao Pang , Baochang Zhang , Qingming Huang

A Unified Approach to Count-Based Weakly-Supervised Learning

High-quality labels are often very scarce, whereas unlabeled data with inferred weak labels occurs more naturally. In many cases, these weak labels dictate the frequency of each respective class over a set of instances. In this paper, we…

Machine Learning · Computer Science 2023-11-27 Vinay Shukla , Zhe Zeng , Kareem Ahmed , Guy Van den Broeck

Reduction-based Pseudo-label Generation for Instance-dependent Partial Label Learning

Instance-dependent Partial Label Learning (ID-PLL) aims to learn a multi-class predictive model given training instances annotated with candidate labels related to features, among which correct labels are hidden fixed but unknown. The…

Machine Learning · Computer Science 2024-10-29 Congyu Qiao , Ning Xu , Yihao Hu , Xin Geng