English
Related papers

Related papers: Pairwise Feedback for Data Programming

200 papers

Learning with noisy labels has attracted a lot of attention in recent years, where the mainstream approaches are in pointwise manners. Meanwhile, pairwise manners have shown great potential in supervised metric learning and unsupervised…

Machine Learning · Computer Science 2021-06-18 Songhua Wu , Xiaobo Xia , Tongliang Liu , Bo Han , Mingming Gong , Nannan Wang , Haifeng Liu , Gang Niu

Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to…

Machine Learning · Computer Science 2022-03-28 Peilin Yu , Tiffany Ding , Stephen H. Bach

Obtaining large annotated datasets is critical for training successful machine learning models and it is often a bottleneck in practice. Weak supervision offers a promising alternative for producing labeled datasets without ground truth…

Machine Learning · Computer Science 2021-01-27 Benedikt Boecking , Willie Neiswanger , Eric Xing , Artur Dubrawski

Noise in data appears to be inevitable in most real-world machine learning applications and would cause severe overfitting problems. Not only can data features contain noise, but labels are also prone to be noisy due to human input. In this…

Machine Learning · Computer Science 2025-05-09 Weipeng Huang , Qin Li , Yang Xiao , Cheng Qiao , Tie Cai , Junwei Liang , Neil J. Hurley , Guangyuan Piao

Machine learning systems impact many stakeholders and groups of users, often disparately. Prior studies have reconciled conflicting user preferences by aggregating a high volume of manually labeled pairwise comparisons, but this technique…

Computers and Society · Computer Science 2020-12-04 Ryan Steed , Benjamin Williams

Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive…

Machine Learning · Statistics 2018-12-10 Alexander Ratner , Christopher De Sa , Sen Wu , Daniel Selsam , Christopher Ré

To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Songzhu Zheng , Pengxiang Wu , Aman Goswami , Mayank Goswami , Dimitris Metaxas , Chao Chen

Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the…

Machine Learning · Statistics 2021-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels…

Machine Learning · Computer Science 2018-07-24 Michael A. Hedderich , Dietrich Klakow

Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set…

Machine Learning · Computer Science 2019-11-25 Oishik Chatterjee , Ganesh Ramakrishnan , Sunita Sarawagi

Deep neural networks can memorize corrupted labels, making data quality critical for model performance, yet real-world datasets are frequently compromised by both label noise and input noise. This paper proposes a mutual information-based…

Machine Learning · Computer Science 2025-08-12 Jinghan Yang , Jiayu Weng

Labeling training data has become one of the major roadblocks to using machine learning. Among various weak supervision paradigms, programmatic weak supervision (PWS) has achieved remarkable success in easing the manual labeling bottleneck…

Machine Learning · Computer Science 2022-02-15 Jieyu Zhang , Cheng-Yu Hsieh , Yue Yu , Chao Zhang , Alexander Ratner

We study crowdsourced PAC learning of threshold functions, where the labels are gathered from a pool of annotators some of whom may behave adversarially. This is yet a challenging problem and until recently has computationally and query…

Machine Learning · Computer Science 2022-12-07 Shiwei Zeng , Jie Shen

Several works in computer vision have demonstrated the effectiveness of active learning for adapting the recognition model when new unlabeled data becomes available. Most of these works consider that labels obtained from the annotator are…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Sudipta Paul , Shivkumar Chandrasekaran , B. S. Manjunath , Amit K. Roy-Chowdhury

Pairwise preference data have played an important role in the alignment of large language models (LLMs). Each sample of such data consists of a prompt, two different responses to the prompt, and a binary label indicating which of the two…

Computation and Language · Computer Science 2026-05-12 Zhongze Cai , Xiaocheng Li

The recent success of deep learning is mostly due to the availability of big datasets with clean annotations. However, gathering a cleanly annotated dataset is not always feasible due to practical challenges. As a result, label noise is a…

Computer Vision and Pattern Recognition · Computer Science 2020-03-25 Görkem Algan , İlkay Ulusoy

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are…

Computation and Language · Computer Science 2023-11-03 Song Wang , Zhen Tan , Ruocheng Guo , Jundong Li

Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role…

Machine Learning · Computer Science 2025-03-26 Jiahui Li , Tai-Wei Chang , Kun Kuang , Ximing Li , Long Chen , Jun Zhou

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta
‹ Prev 1 2 3 10 Next ›