Related papers: Nemo: Guiding and Contextualizing Weak Supervision…

Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth…

Machine Learning · Computer Science 2021-01-27 Benedikt Boecking , Willie Neiswanger , Eric Xing , Artur Dubrawski

A Survey on Programmatic Weak Supervision

Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck…

Machine Learning · Computer Science 2022-02-15 Jieyu Zhang , Cheng-Yu Hsieh , Yue Yu , Chao Zhang , Alexander Ratner

ScriptoriumWS: A Code Generation Assistant for Weak Supervision

Weak supervision is a popular framework for overcoming the labeled data bottleneck: the need to obtain labels for training data. In weak supervision, multiple noisy-but-cheap sources are used to provide guesses of the label and are…

Machine Learning · Computer Science 2025-02-19 Tzu-Heng Huang , Catherine Cao , Spencer Schoenberg , Harit Vishwakarma , Nicholas Roberts , Frederic Sala

Losses over Labels: Weakly Supervised Learning via Direct Loss Construction

Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the…

Machine Learning · Computer Science 2023-10-06 Dylan Sam , J. Zico Kolter

AutoWS: Automated Weak Supervision Framework for Text Classification

Creating large, good quality labeled data has become one of the major bottlenecks for developing machine learning applications. Multiple techniques have been developed to either decrease the dependence of labeled data (zero/few-shot…

Computation and Language · Computer Science 2023-02-08 Abhinav Bohra , Huy Nguyen , Devashish Khatwani

Creating Training Sets via Weak Indirect Supervision

Creating labeled training sets has become one of the major roadblocks in machine learning. To address this, recent \emph{Weak Supervision (WS)} frameworks synthesize training labels from multiple potentially noisy supervision sources.…

Machine Learning · Computer Science 2022-03-16 Jieyu Zhang , Bohan Wang , Xiangchen Song , Yujing Wang , Yaming Yang , Jing Bai , Alexander Ratner

Universalizing Weak Supervision

Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality…

Machine Learning · Computer Science 2023-11-30 Changho Shin , Winfred Li , Harit Vishwakarma , Nicholas Roberts , Frederic Sala

Binary Classification with Positive Labeling Sources

To create a large amount of training labels for machine learning models effectively and efficiently, researchers have turned to Weak Supervision (WS), which uses programmatic labeling sources rather than manual annotation. Existing works of…

Machine Learning · Computer Science 2022-08-04 Jieyu Zhang , Yujing Wang , Yaming Yang , Yang Luo , Alexander Ratner

WRENCH: A Comprehensive Benchmark for Weak Supervision

Recent Weak Supervision (WS) approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper…

Machine Learning · Computer Science 2021-10-12 Jieyu Zhang , Yue Yu , Yinghao Li , Yujing Wang , Yaming Yang , Mao Yang , Alexander Ratner

End-to-End Weak Supervision

Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art…

Machine Learning · Computer Science 2021-12-01 Salva Rühling Cachay , Benedikt Boecking , Artur Dubrawski

Understanding temporally weakly supervised training: A case study for keyword spotting

The currently most prominent algorithm to train keyword spotting (KWS) models with deep neural networks (DNNs) requires strong supervision i.e., precise knowledge of the spoken keyword location in time. Thus, most KWS approaches treat the…

Sound · Computer Science 2023-05-31 Heinrich Dinkel , Weiji Zhuang , Zhiyong Yan , Yongqing Wang , Junbo Zhang , Yujun Wang

WeaNF: Weak Supervision with Normalizing Flows

A popular approach to decrease the need for costly manual annotation of large data sets is weak supervision, which introduces problems of noisy labels, coverage and bias. Methods for overcoming these problems have either relied on…

Computation and Language · Computer Science 2022-05-03 Andreas Stephan , Benjamin Roth

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning

In recent years, neuro-symbolic methods have become a popular and powerful approach that augments artificial intelligence systems with the capability to perform abstract, logical, and quantitative deductions with enhanced precision and…

Artificial Intelligence · Computer Science 2025-02-04 Yuxuan Wu , Hideki Nakayama

Generalized Weak Supervision for Neural Information Retrieval

Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this…

Information Retrieval · Computer Science 2023-04-19 Yen-Chieh Lien , Hamed Zamani , W. Bruce Croft

The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored…

Machine Learning · Computer Science 2021-08-27 Chufan Gao , Mononito Goswami

Learning Hyper Label Model for Programmatic Weak Supervision

To reduce the human annotation efforts, the programmatic weak supervision (PWS) paradigm abstracts weak supervision sources as labeling functions (LFs) and involves a label model to aggregate the output of multiple LFs to produce training…

Machine Learning · Computer Science 2023-03-09 Renzhi Wu , Shen-En Chen , Jieyu Zhang , Xu Chu

Label Augmentation with Reinforced Labeling for Weak Supervision

Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs)…

Machine Learning · Computer Science 2022-04-14 Gürkan Solmaz , Flavio Cirillo , Fabio Maresca , Anagha Gode Anil Kumar

Lifting Weak Supervision To Structured Prediction

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where…

Machine Learning · Computer Science 2022-11-28 Harit Vishwakarma , Nicholas Roberts , Frederic Sala

Heuristic-Based Weak Learning for Automated Decision-Making

Machine learning systems impact many stakeholders and groups of users, often disparately. Prior studies have reconciled conflicting user preferences by aggregating a high volume of manually labeled pairwise comparisons, but this technique…

Computers and Society · Computer Science 2020-12-04 Ryan Steed , Benjamin Williams