Related papers: Pairwise Feedback for Data Programming

Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels

Learning with noisy labels has attracted a lot of attention in recent years, where the mainstream approaches are in pointwise manners. Meanwhile, pairwise manners have shown great potential in supervised metric learning and unsupervised…

Machine Learning · Computer Science 2021-06-18 Songhua Wu , Xiaobo Xia , Tongliang Liu , Bo Han , Mingming Gong , Nannan Wang , Haifeng Liu , Gang Niu

Learning from Multiple Noisy Partial Labelers

Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to…

Machine Learning · Computer Science 2022-03-28 Peilin Yu , Tiffany Ding , Stephen H. Bach

Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth…

Machine Learning · Computer Science 2021-01-27 Benedikt Boecking , Willie Neiswanger , Eric Xing , Artur Dubrawski

Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts

Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this…

Machine Learning · Computer Science 2025-05-09 Weipeng Huang , Qin Li , Yang Xiao , Cheng Qiao , Tie Cai , Junwei Liang , Neil J. Hurley , Guangyuan Piao

Heuristic-Based Weak Learning for Automated Decision-Making

Machine learning systems impact many stakeholders and groups of users, often disparately. Prior studies have reconciled conflicting user preferences by aggregating a high volume of manually labeled pairwise comparisons, but this technique…

Computers and Society · Computer Science 2020-12-04 Ryan Steed , Benjamin Williams

Data Programming: Creating Large Training Sets, Quickly

Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive…

Machine Learning · Statistics 2018-12-10 Alexander Ratner , Christopher De Sa , Sen Wu , Daniel Selsam , Christopher Ré

Error-Bounded Correction of Noisy Labels

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Songzhu Zheng , Pengxiang Wu , Aman Goswami , Mayank Goswami , Dimitris Metaxas , Chao Chen

Harmless label noise and informative soft-labels in supervised classification

Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the…

Machine Learning · Statistics 2021-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels…

Machine Learning · Computer Science 2018-07-24 Michael A. Hedderich , Dietrich Klakow

Data Programming using Continuous and Quality-Guided Labeling Functions

Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set…

Machine Learning · Computer Science 2019-11-25 Oishik Chatterjee , Ganesh Ramakrishnan , Sunita Sarawagi

Detecting Mislabeled and Corrupted Data via Pointwise Mutual Information

Deep neural networks can memorize corrupted labels, making data quality critical for model performance, yet real-world datasets are frequently compromised by both label noise and input noise. This paper proposes a mutual information-based…

Machine Learning · Computer Science 2025-08-12 Jinghan Yang , Jiayu Weng

A Survey on Programmatic Weak Supervision

Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck…

Machine Learning · Computer Science 2022-02-15 Jieyu Zhang , Cheng-Yu Hsieh , Yue Yu , Chao Zhang , Alexander Ratner

Efficient PAC Learning from the Crowd with Pairwise Comparisons

We study crowdsourced PAC learning of threshold functions, where the labels are gathered from a pool of annotators some of whom may behave adversarially. This is yet a challenging problem and until recently has computationally and query…

Machine Learning · Computer Science 2022-12-07 Shiwei Zeng , Jie Shen

Exploiting Context for Robustness to Label Noise in Active Learning

Several works in computer vision have demonstrated the effectiveness of active learning for adapting the recognition model when new unlabeled data becomes available. Most of these works consider that labels obtained from the annotator are…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Sudipta Paul , Shivkumar Chandrasekaran , B. S. Manjunath , Amit K. Roy-Chowdhury

Users as Annotators: LLM Preference Learning from Comparison Mode

Pairwise preference data have played an important role in the alignment of large language models (LLMs). Each sample of such data consists of a prompt, two different responses to the prompt, and a binary label indicating which of the two…

Computation and Language · Computer Science 2026-05-12 Zhongze Cai , Xiaocheng Li

Label Noise Types and Their Effects on Deep Learning

The recent success of deep learning is mostly due to the availability of big datasets with clean annotations. However, gathering a cleanly annotated dataset is not always feasible due to practical challenges. As a result, label noise is a…

Computer Vision and Pattern Recognition · Computer Science 2020-03-25 Görkem Algan , İlkay Ulusoy

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance

Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are…

Computation and Language · Computer Science 2023-11-03 Song Wang , Zhen Tan , Ruocheng Guo , Jundong Li

Learning Causal Transition Matrix for Instance-dependent Label Noise

Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role…

Machine Learning · Computer Science 2025-03-26 Jiahui Li , Tai-Wei Chang , Kun Kuang , Ximing Li , Long Chen , Jun Zhou

Iterative Data Programming for Expanding Text Classification Corpora

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta