English
Related papers

Related papers: A Benchmark Generative Probabilistic Model for Wea…

200 papers

Partial label (PL) learning tackles the problem where each training instance is associated with a set of candidate labels that include both the true label and irrelevant noise labels. In this paper, we propose a novel multi-level generative…

Machine Learning · Computer Science 2020-05-13 Yan Yan , Yuhong Guo

As unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. The question is how to effectively make use of such data. In this work, we revisit the self-training technique for…

Computation and Language · Computer Science 2021-10-05 Yiming Chen , Yan Zhang , Chen Zhang , Grandee Lee , Ran Cheng , Haizhou Li

Pseudo-labeling is a commonly used paradigm in semi-supervised learning, yet its application to semi-supervised regression (SSR) remains relatively under-explored. Unlike classification, where pseudo-labels are discrete and confidence-based…

Machine Learning · Computer Science 2025-10-20 Xueqing Sun , Renzhen Wang , Quanziang Wang , Yichen Wu , Xixi Jia , Deyu Meng

Although state-of-the-art Speech Foundational Models can produce high-quality text pseudo-labels, applying Semi-Supervised Learning (SSL) for in-the-wild real-world data remains challenging due to its richer and more complex acoustics…

Computation and Language · Computer Science 2026-03-16 Wen Ding , Fan Qian

Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data. While existing SSL methods focus on the traditional setting, a practical and challenging scenario called label Missing Not…

Machine Learning · Computer Science 2023-08-21 Yue Duan , Zhen Zhao , Lei Qi , Luping Zhou , Lei Wang , Yinghuan Shi

Semi-supervised learning (SSL) has attracted enormous attention due to its vast potential of mitigating the dependence on large labeled datasets. The latest methods (e.g., FixMatch) use a combination of consistency regularization and…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Yuhao Chen , Xin Tan , Borui Zhao , Zhaowei Chen , Renjie Song , Jiajun Liang , Xuequan Lu

The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the…

Machine Learning · Computer Science 2023-10-23 Jianing Wang , Qiushi Sun , Nuo Chen , Chengyu Wang , Jun Huang , Ming Gao , Xiang Li

Multi-label learning is a challenging computer vision task that requires assigning multiple categories to each image. However, fully annotating large-scale datasets is often impractical due to high costs and effort, motivating the study of…

Computer Vision and Pattern Recognition · Computer Science 2025-08-29 Luong Tran , Thieu Vo , Anh Nguyen , Sang Dinh , Van Nguyen

Pseudo Labeling is a technique used to improve the performance of semi-supervised Graph Neural Networks (GNNs) by generating additional pseudo-labels based on confident predictions. However, the quality of generated pseudo-labels has been a…

Machine Learning · Computer Science 2023-12-20 Weigang Lu , Ziyu Guan , Wei Zhao , Yaming Yang , Yuanhai Lv , Lining Xing , Baosheng Yu , Dacheng Tao

In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some…

Machine Learning · Statistics 2011-08-25 Christopher M. White , Sanjeev P. Khudanpur , Patrick J. Wolfe

Curation of large fully supervised datasets has become one of the major roadblocks for machine learning. Weak supervision provides an alternative to supervised learning by training with cheap, noisy, and possibly correlated labeling…

Machine Learning · Computer Science 2021-06-01 Chidubem Arachie , Bert Huang

Due to the high human cost of annotation, it is non-trivial to curate a large-scale medical dataset that is fully labeled for all classes of interest. Instead, it would be convenient to collect multiple small partially labeled datasets from…

Machine Learning · Computer Science 2022-04-20 Nanqing Dong , Jiayi Wang , Irina Voiculescu

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning…

Computation and Language · Computer Science 2025-04-02 Enshuo Hsu , Kirk Roberts

Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring…

Computation and Language · Computer Science 2023-05-15 Yu Meng , Martin Michalski , Jiaxin Huang , Yu Zhang , Tarek Abdelzaher , Jiawei Han

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where…

Machine Learning · Computer Science 2022-11-28 Harit Vishwakarma , Nicholas Roberts , Frederic Sala

Pre-trained Vision-Language Models (VLMs) exhibit strong zero-shot classification abilities, demonstrating great potential for generating weakly supervised labels. Unfortunately, existing weakly supervised learning methods are short of…

Machine Learning · Computer Science 2025-06-04 Zhongnian Li , Jinghao Xu , Peng Ying , Meng Wei , Xinzheng Xu

Deep regression is an important problem with numerous applications. These range from computer vision tasks such as age estimation from photographs, to medical tasks such as ejection fraction estimation from echocardiograms for disease…

Computer Vision and Pattern Recognition · Computer Science 2023-02-16 Weihang Dai , Xiaomeng Li , Kwang-Ting Cheng

Building machine learning models for natural language understanding (NLU) tasks relies heavily on labeled data. Weak supervision has been proven valuable when large amount of labeled data is unavailable or expensive to obtain. Existing…

Computation and Language · Computer Science 2022-05-24 Guoqing Zheng , Giannis Karamanolakis , Kai Shu , Ahmed Hassan Awadallah

Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-intensive and unscalable. As a result, genomic data…

Machine Learning · Computer Science 2025-12-08 Yanhua Xu

Real-world training data is often noisy; for example, human annotators assign conflicting class labels to the same instances. Partial-label learning (PLL) is a weakly supervised learning paradigm that allows training classifiers in this…

Machine Learning · Computer Science 2025-10-27 Tobias Fuchs , Florian Kalinke