Related papers: Understanding Programmatic Weak Supervision via So…

WeShap: Weak Supervision Source Evaluation with Shapley Values

Efficient data annotation stands as a significant bottleneck in training contemporary machine learning models. The Programmatic Weak Supervision (PWS) pipeline presents a solution by utilizing multiple weak supervision sources to…

Machine Learning · Computer Science 2025-03-18 Naiqing Guan , Nick Koudas

A Survey on Programmatic Weak Supervision

Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck…

Machine Learning · Computer Science 2022-02-15 Jieyu Zhang , Cheng-Yu Hsieh , Yue Yu , Chao Zhang , Alexander Ratner

Weak Supervision Performance Evaluation via Partial Identification

Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels, utilizing weak labels from heuristics, crowdsourcing, or pre-trained models. However, the absence of ground truth…

Machine Learning · Statistics 2024-11-01 Felipe Maia Polo , Subha Maity , Mikhail Yurochkin , Moulinath Banerjee , Yuekai Sun

End-to-End Weak Supervision

Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art…

Machine Learning · Computer Science 2021-12-01 Salva Rühling Cachay , Benedikt Boecking , Artur Dubrawski

Weak Supervision with Incremental Source Accuracy Estimation

Motivated by the desire to generate labels for real-time data we develop a method to estimate the dependency structure and accuracy of weak supervision sources incrementally. Our method first estimates the dependency structure associated…

Machine Learning · Computer Science 2022-05-12 Richard Gresham Correro

Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision

Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy…

Machine Learning · Computer Science 2022-10-11 Jieyu Zhang , Linxin Song , Alexander Ratner

Fusing Conditional Submodular GAN and Programmatic Weak Supervision

Programmatic Weak Supervision (PWS) and generative models serve as crucial tools that enable researchers to maximize the utility of existing datasets without resorting to laborious data gathering and manual annotation processes. PWS uses…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Kumar Shubham , Pranav Sastry , Prathosh AP

Learning Hyper Label Model for Programmatic Weak Supervision

To reduce the human annotation efforts, the programmatic weak supervision (PWS) paradigm abstracts weak supervision sources as labeling functions (LFs) and involves a label model to aggregate the output of multiple LFs to produce training…

Machine Learning · Computer Science 2023-03-09 Renzhi Wu , Shen-En Chen , Jieyu Zhang , Xu Chu

Lifting Weak Supervision To Structured Prediction

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where…

Machine Learning · Computer Science 2022-11-28 Harit Vishwakarma , Nicholas Roberts , Frederic Sala

Creating Training Sets via Weak Indirect Supervision

Creating labeled training sets has become one of the major roadblocks in machine learning. To address this, recent \emph{Weak Supervision (WS)} frameworks synthesize training labels from multiple potentially noisy supervision sources.…

Machine Learning · Computer Science 2022-03-16 Jieyu Zhang , Bohan Wang , Xiangchen Song , Yujing Wang , Yaming Yang , Jing Bai , Alexander Ratner

Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that…

Machine Learning · Statistics 2025-08-07 Verónica Álvarez , Santiago Mazuelas , Steven An , Sanjoy Dasgupta

Low Resource Pipeline for Spoken Language Understanding via Weak Supervision

In Weak Supervised Learning (WSL), a model is trained over noisy labels obtained from semantic rules and task-specific pre-trained models. Rules offer limited generalization over tasks and require significant manual efforts while…

Computation and Language · Computer Science 2022-06-22 Ayush Kumar , Rishabh Kumar Tripathi , Jithendra Vepa

Mitigating Source Bias for Fairer Weak Supervision

Weak supervision enables efficient development of training sets by reducing the need for ground truth labels. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown…

Machine Learning · Computer Science 2023-11-30 Changho Shin , Sonia Cromp , Dyah Adila , Frederic Sala

Refining Labeling Functions with Limited Labeled Data

Programmatic weak supervision (PWS) significantly reduces human effort for labeling data by combining the outputs of user-provided labeling functions (LFs) on unlabeled datapoints. However, the quality of the generated labels depends…

Machine Learning · Computer Science 2025-06-05 Chenjie Li , Amir Gilad , Boris Glavic , Zhengjie Miao , Sudeepa Roy

Deeper Understanding of Black-box Predictions via Generalized Influence Functions

Influence functions (IFs) elucidate how training data changes model behavior. However, the increasing size and non-convexity in large-scale models make IFs inaccurate. We suspect that the fragility comes from the first-order approximation…

Machine Learning · Computer Science 2024-05-07 Hyeonsu Lyu , Jonggyu Jang , Sehyun Ryu , Hyun Jong Yang

Integrated Weak Learning

We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple…

Machine Learning · Computer Science 2022-06-22 Peter Hayes , Mingtian Zhang , Raza Habib , Jordan Burgess , Emine Yilmaz , David Barber

Supervising Feature Influence

Causal influence measures for machine learnt classifiers shed light on the reasons behind classification, and aid in identifying influential input features and revealing their biases. However, such analyses involve evaluating the classifier…

Machine Learning · Computer Science 2018-04-10 Shayak Sen , Piotr Mardziel , Anupam Datta , Matthew Fredrikson

Towards Robust Influence Functions with Flat Validation Minima

The Influence Function (IF) is a widely used technique for assessing the impact of individual training samples on model predictions. However, existing IF methods often fail to provide reliable influence estimates in deep neural networks,…

Machine Learning · Computer Science 2025-12-02 Xichen Ye , Yifan Wu , Weizhong Zhang , Cheng Jin , Yifan Chen

Training Subset Selection for Weak Supervision

Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of…

Machine Learning · Statistics 2023-03-08 Hunter Lang , Aravindan Vijayaraghavan , David Sontag

Leveraging Large Language Models for Structure Learning in Prompted Weak Supervision

Prompted weak supervision (PromptedWS) applies pre-trained large language models (LLMs) as the basis for labeling functions (LFs) in a weak supervision framework to obtain large labeled datasets. We further extend the use of LLMs in the…

Machine Learning · Computer Science 2024-02-06 Jinyan Su , Peilin Yu , Jieyu Zhang , Stephen H. Bach