Related papers: Efficient Human Computation

Active clustering for labeling training data

Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function.…

Data Structures and Algorithms · Computer Science 2021-10-28 Quentin Lutz , Élie de Panafieu , Alex Scott , Maya Stein

Labels, Information, and Computation: Efficient Learning Using Sufficient Labels

In supervised learning, obtaining a large set of fully-labeled training data is expensive. We show that we do not always need full label information on every single training example to train a competent classifier. Specifically, inspired by…

Machine Learning · Computer Science 2023-01-18 Shiyu Duan , Spencer Chang , Jose C. Principe

Efficient PAC Learning from the Crowd

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of…

Machine Learning · Computer Science 2017-04-17 Pranjal Awasthi , Avrim Blum , Nika Haghtalab , Yishay Mansour

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Yuan-Hong Liao , Amlan Kar , Sanja Fidler

Label Distribution Learning

Although multi-label learning can deal with many problems with label ambiguity, it does not fit some real applications well where the overall distribution of the importance of the labels matters. This paper proposes a novel learning…

Machine Learning · Computer Science 2016-04-06 Xin Geng

How to distribute data across tasks for meta-learning?

Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it…

Machine Learning · Computer Science 2022-04-11 Alexandru Cioba , Michael Bromberg , Qian Wang , Ritwik Niyogi , Georgios Batzolis , Jezabel Garcia , Da-shan Shiu , Alberto Bernacchia

Aggregating Soft Labels from Crowd Annotations Improves Uncertainty Estimation Under Distribution Shift

Selecting an effective training signal for machine learning tasks is difficult: expert annotations are expensive, and crowd-sourced annotations may not be reliable. Recent work has demonstrated that learning from a distribution over labels…

Computation and Language · Computer Science 2025-04-23 Dustin Wright , Isabelle Augenstein

Label Efficient Learning by Exploiting Multi-class Output Codes

We present a new perspective on the popular multi-class algorithmic techniques of one-vs-all and error correcting output codes. Rather than studying the behavior of these techniques for supervised learning, we establish a connection between…

Machine Learning · Computer Science 2016-11-28 Maria Florina Balcan , Travis Dick , Yishay Mansour

A Data Management Approach for Dataset Selection Using Human Computation

As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly…

Machine Learning · Computer Science 2013-07-16 Alexandros Ntoulas , Omar Alonso , Vasilis Kandylas

Efficiency of active learning for the allocation of workers on crowdsourced classification tasks

Crowdsourcing has been successfully employed in the past as an effective and cheap way to execute classification tasks and has therefore attracted the attention of the research community. However, we still lack a theoretical understanding…

Human-Computer Interaction · Computer Science 2016-10-20 Edoardo Manino , Long Tran-Thanh , Nicholas R. Jennings

Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels

Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes. In theory, this is a supervised learning method that requires a large amount of labeling. However,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-02 XIn Zhang , Yuqi Song , Fei Zuo , Xiaofeng Wang

Label Budget Allocation in Multi-Task Learning

The cost of labeling data often limits the performance of machine learning systems. In multi-task learning, related tasks provide information to each other and improve overall performance, but the label cost can vary among tasks. How should…

Machine Learning · Computer Science 2023-08-25 Ximeng Sun , Kihyuk Sohn , Kate Saenko , Clayton Mellina , Xiao Bian

Beyond Hard Labels: Investigating data label distributions

High-quality data is a key aspect of modern machine learning. However, labels generated by humans suffer from issues like label noise and class ambiguities. We raise the question of whether hard labels are sufficient to represent the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-07 Vasco Grossmann , Lars Schmarje , Reinhard Koch

Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching

The unprecedented demand for large amount of data has catalyzed the trend of combining human insights with machine learning techniques, which facilitate the use of crowdsourcing to enlist label information both effectively and efficiently.…

Machine Learning · Statistics 2018-06-26 Yao Zhou , Jingrui He

How many labelers do you have? A closer look at gold-standard labels

The construction of most supervised learning datasets revolves around collecting multiple labels for each instance, then aggregating the labels to form a type of "gold-standard". We question the wisdom of this pipeline by developing a…

Statistics Theory · Mathematics 2024-06-06 Chen Cheng , Hilal Asi , John Duchi

More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1- {\alpha}$. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the…

Machine Learning · Computer Science 2019-07-16 Xinyang Yi , Zhaoran Wang , Zhuoran Yang , Constantine Caramanis , Han Liu

A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks

Human annotations are vital to supervised learning, yet annotators often disagree on the correct label, especially as annotation tasks increase in complexity. A strategy to improve label quality is to ask multiple annotators to label the…

Machine Learning · Computer Science 2023-12-22 Alexander Braylan , Madalyn Marabella , Omar Alonso , Matthew Lease

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the…

Machine Learning · Statistics 2014-10-31 Ariel Jaffe , Boaz Nadler , Yuval Kluger

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality…

Machine Learning · Computer Science 2024-07-08 Daniel Kałuża , Andrzej Janusz , Dominik Ślęzak

Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts

In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To…

Machine Learning · Computer Science 2021-01-07 Surin Ahn , Ayfer Ozgur , Mert Pilanci