Related papers: A Benchmark Generative Probabilistic Model for Wea…

Multi-Level Generative Models for Partial Label Learning with Non-random Label Noise

Partial label (PL) learning tackles the problem where each training instance is associated with a set of candidate labels that include both the true label and irrelevant noise labels. In this paper, we propose a novel multi-level generative…

Machine Learning · Computer Science 2020-05-13 Yan Yan , Yuhong Guo

Revisiting Self-Training for Few-Shot Learning of Language Model

As unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. The question is how to effectively make use of such data. In this work, we revisit the self-training technique for…

Computation and Language · Computer Science 2021-10-05 Yiming Chen , Yan Zhang , Chen Zhang , Grandee Lee , Ran Cheng , Haizhou Li

Semi-Supervised Regression with Heteroscedastic Pseudo-Labels

Pseudo-labeling is a commonly used paradigm in semi-supervised learning, yet its application to semi-supervised regression (SSR) remains relatively under-explored. Unlike classification, where pseudo-labels are discrete and confidence-based…

Machine Learning · Computer Science 2025-10-20 Xueqing Sun , Renzhen Wang , Quanziang Wang , Yichen Wu , Xixi Jia , Deyu Meng

LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data

Although state-of-the-art Speech Foundational Models can produce high-quality text pseudo-labels, applying Semi-Supervised Learning (SSL) for in-the-wild real-world data remains challenging due to its richer and more complex acoustics…

Computation and Language · Computer Science 2026-03-16 Wen Ding , Fan Qian

Towards Semi-supervised Learning with Non-random Missing Labels

Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data. While existing SSL methods focus on the traditional setting, a practical and challenging scenario called label Missing Not…

Machine Learning · Computer Science 2023-08-21 Yue Duan , Zhen Zhao , Lei Qi , Luping Zhou , Lei Wang , Yinghuan Shi

Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data

Semi-supervised learning (SSL) has attracted enormous attention due to its vast potential of mitigating the dependence on large labeled datasets. The latest methods (e.g., FixMatch) use a combination of consistency regularization and…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Yuhao Chen , Xin Tan , Borui Zhao , Zhaowei Chen , Renjie Song , Jiajun Liang , Xuequan Lu

Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding

The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the…

Machine Learning · Computer Science 2023-10-23 Jianing Wang , Qiushi Sun , Nuo Chen , Chengyu Wang , Jun Huang , Ming Gao , Xiang Li

More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning

Multi-label learning is a challenging computer vision task that requires assigning multiple categories to each image. However, fully annotating large-scale datasets is often impractical due to high costs and effort, motivating the study of…

Computer Vision and Pattern Recognition · Computer Science 2025-08-29 Luong Tran , Thieu Vo , Anh Nguyen , Sang Dinh , Van Nguyen

Pseudo Contrastive Learning for Graph-based Semi-supervised Learning

Pseudo Labeling is a technique used to improve the performance of semi-supervised Graph Neural Networks (GNNs) by generating additional pseudo-labels based on confident predictions. However, the quality of generated pseudo-labels has been a…

Machine Learning · Computer Science 2023-12-20 Weigang Lu , Ziyu Guan , Wei Zhao , Yaming Yang , Yuanhai Lv , Lining Xing , Baosheng Yu , Dacheng Tao

Likelihood-based semi-supervised model selection with applications to speech processing

In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some…

Machine Learning · Statistics 2011-08-25 Christopher M. White , Sanjeev P. Khudanpur , Patrick J. Wolfe

Constrained Labeling for Weakly Supervised Learning

Curation of large fully supervised datasets has become one of the major roadblocks for machine learning. Weak supervision provides an alternative to supervised learning by training with cheap, noisy, and possibly correlated labeling…

Machine Learning · Computer Science 2021-06-01 Chidubem Arachie , Bert Huang

Revisiting Vicinal Risk Minimization for Partially Supervised Multi-Label Classification Under Data Scarcity

Due to the high human cost of annotation, it is non-trivial to curate a large-scale medical dataset that is fully labeled for all classes of interest. Instead, it would be convenient to collect multiple small partially labeled datasets from…

Machine Learning · Computer Science 2022-04-20 Nanqing Dong , Jiayi Wang , Irina Voiculescu

Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning…

Computation and Language · Computer Science 2025-04-02 Enshuo Hsu , Kirk Roberts

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring…

Computation and Language · Computer Science 2023-05-15 Yu Meng , Martin Michalski , Jiaxin Huang , Yu Zhang , Tarek Abdelzaher , Jiawei Han

Lifting Weak Supervision To Structured Prediction

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where…

Machine Learning · Computer Science 2022-11-28 Harit Vishwakarma , Nicholas Roberts , Frederic Sala

Learning from True-False Labels via Multi-modal Prompt Retrieving

Pre-trained Vision-Language Models (VLMs) exhibit strong zero-shot classification abilities, demonstrating great potential for generating weakly supervised labels. Unfortunately, existing weakly supervised learning methods are short of…

Machine Learning · Computer Science 2025-06-04 Zhongnian Li , Jinghao Xu , Peng Ying , Meng Wei , Xinzheng Xu

Semi-Supervised Deep Regression with Uncertainty Consistency and Variational Model Ensembling via Bayesian Neural Networks

Deep regression is an important problem with numerous applications. These range from computer vision tasks such as age estimation from photographs, to medical tasks such as ejection fraction estimation from echocardiograms for disease…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Weihang Dai , Xiaomeng Li , Kwang-Ting Cheng

WALNUT: A Benchmark on Semi-weakly Supervised Learning for Natural Language Understanding

Building machine learning models for natural language understanding (NLU) tasks relies heavily on labeled data. Weak supervision has been proven valuable when large amount of labeled data is unavailable or expensive to obtain. Existing…

Computation and Language · Computer Science 2022-05-24 Guoqing Zheng , Giannis Karamanolakis , Kai Shu , Ahmed Hassan Awadallah

Mitigating the Antigenic Data Bottleneck: Semi-supervised Learning with Protein Language Models for Influenza A Surveillance

Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-intensive and unscalable. As a result, genomic data…

Machine Learning · Computer Science 2025-12-08 Yanhua Xu

Robust Partial-Label Learning by Leveraging Class Activation Values

Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this…

Machine Learning · Computer Science 2025-10-27 Tobias Fuchs , Florian Kalinke