English
Related papers

Related papers: Weak Supervision with Incremental Source Accuracy …

200 papers

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however,…

Machine Learning · Statistics 2019-03-15 Paroma Varma , Frederic Sala , Ann He , Alexander Ratner , Christopher Ré

The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that…

Machine Learning · Statistics 2025-08-07 Verónica Álvarez , Santiago Mazuelas , Steven An , Sanjoy Dasgupta

We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy…

Machine Learning · Computer Science 2025-05-05 Alessio Mazzetto , Reza Esfandiarpoor , Akash Singirikonda , Eli Upfal , Stephen H. Bach

Weakly-supervised learning is a paradigm for alleviating the scarcity of labeled data by leveraging lower-quality but larger-scale supervision signals. While existing work mainly focuses on utilizing a certain type of weak supervision, we…

Machine Learning · Statistics 2019-10-11 Yivan Zhang , Nontawat Charoenphakdee , Masashi Sugiyama

Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art…

Machine Learning · Computer Science 2021-12-01 Salva Rühling Cachay , Benedikt Boecking , Artur Dubrawski

Weak supervision enables efficient development of training sets by reducing the need for ground truth labels. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown…

Machine Learning · Computer Science 2023-11-30 Changho Shin , Sonia Cromp , Dyah Adila , Frederic Sala

As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels…

Machine Learning · Statistics 2018-12-10 Alexander Ratner , Braden Hancock , Jared Dunnmon , Frederic Sala , Shreyash Pandey , Christopher Ré

Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of…

Machine Learning · Statistics 2023-03-08 Hunter Lang , Aravindan Vijayaraghavan , David Sontag

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple…

Machine Learning · Computer Science 2022-06-22 Peter Hayes , Mingtian Zhang , Raza Habib , Jordan Burgess , Emine Yilmaz , David Barber

The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive, though it is not always apparent how to leverage such data for model fitting or validation. We present a…

Machine Learning · Statistics 2022-02-10 Maxime Cauchois , Suyash Gupta , Alnur Ali , John Duchi

Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on…

Machine Learning · Computer Science 2023-03-14 Benedikt Boecking , Nicholas Roberts , Willie Neiswanger , Stefano Ermon , Frederic Sala , Artur Dubrawski

Although existing semantic segmentation approaches achieve impressive results, they still struggle to update their models incrementally as new categories are uncovered. Furthermore, pixel-by-pixel annotations are expensive and…

Computer Vision and Pattern Recognition · Computer Science 2022-04-04 Fabio Cermelli , Dario Fontanel , Antonio Tavera , Marco Ciccone , Barbara Caputo

Neural network approaches have recently shown to be effective in several information retrieval (IR) tasks. However, neural approaches often require large volumes of training data to perform effectively, which is not always available. To…

Information Retrieval · Computer Science 2018-06-14 Hamed Zamani , W. Bruce Croft

Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the…

Machine Learning · Computer Science 2023-10-06 Dylan Sam , J. Zico Kolter

Supervised learning usually requires a large amount of labelled data. However, attaining ground-truth labels is costly for many tasks. Alternatively, weakly supervised methods learn with cheap weak signals that only approximately label some…

Machine Learning · Computer Science 2024-11-26 You Lu , Wenzhuo Song , Chidubem Arachie , Bert Huang

The cost and scarcity of fully supervised labels in statistical machine learning encourage using partially labeled data for model validation as a cheaper and more accessible alternative. Effectively collecting and leveraging weakly…

Machine Learning · Statistics 2022-06-16 Maxime Cauchois , John Duchi

Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where…

Machine Learning · Computer Science 2022-11-28 Harit Vishwakarma , Nicholas Roberts , Frederic Sala

In scenarios where training data is limited due to observation costs or data scarcity, enriching the label information associated with each instance becomes crucial for building high-accuracy classification models. In such contexts, it is…

Machine Learning · Computer Science 2025-07-25 Kosuke Sugiyama , Masato Uchida

Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to…

Machine Learning · Computer Science 2022-03-28 Peilin Yu , Tiffany Ding , Stephen H. Bach
‹ Prev 1 2 3 10 Next ›