Related papers: Auditing for Diversity using Representative Exampl…

Does the dataset meet your expectations? Explaining sample representation in image data

Since the behavior of a neural network model is adversely affected by a lack of diversity in training data, we present a method that identifies and explains such deficiencies. When a dataset is labeled, we note that annotations alone are…

Computer Vision and Pattern Recognition · Computer Science 2020-12-17 Dhasarathy Parthasarathy , Anton Johansson

From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot Learning

A major challenge in Natural Language Processing is obtaining annotated data for supervised learning. An option is the use of crowdsourcing platforms for data annotation. However, crowdsourcing introduces issues related to the annotator's…

Computation and Language · Computer Science 2024-01-25 Alexandre Alcoforado , Thomas Palmeira Ferraz , Lucas Hideki Okamura , Israel Campos Fama , Arnold Moya Lavado , Bárbara Dias Bueno , Bruno Veloso , Anna Helena Reali Costa

Implicit Diversity in Image Summarization

Studies have shown that the people depicted in image search results tend to be of majority groups with respect to socially salient attributes. This skew goes beyond that which already exists in the world - e.g., Kay et al. showed that…

Machine Learning · Computer Science 2020-08-18 L. Elisa Celis , Vijay Keswani

Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning

The availability of large labeled datasets is the key component for the success of deep learning. However, annotating labels on large datasets is generally time-consuming and expensive. Active learning is a research area that addresses the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Felix Buchert , Nassir Navab , Seong Tae Kim

Perception-Driven Bias Detection in Machine Learning via Crowdsourced Visual Judgment

Machine learning systems are increasingly deployed in high-stakes domains, yet they remain vulnerable to bias systematic disparities that disproportionately impact specific demographic groups. Traditional bias detection methods often depend…

Machine Learning · Computer Science 2025-06-16 Chirudeep Tupakula , Rittika Shamsuddin

Crowd Labeling: a survey

Recently, there has been a burst in the number of research projects on human computation via crowdsourcing. Multiple choice (or labeling) questions could be referred to as a common type of problem which is solved by this approach. As an…

Artificial Intelligence · Computer Science 2014-09-04 Jafar Muhammadi , Hamid Reza Rabiee , Abbas Hosseini

Auditing Gender Analyzers on Text Data

AI models have become extremely popular and accessible to the general public. However, they are continuously under the scanner due to their demonstrable biases toward various sections of the society like people of color and non-binary…

Computers and Society · Computer Science 2023-10-11 Siddharth D Jaiswal , Ankit Kumar Verma , Animesh Mukherjee

Dialect Diversity in Text Summarization on Twitter

Discussions on Twitter involve participation from different communities with different dialects and it is often necessary to summarize a large number of posts into a representative sample to provide a synopsis. Yet, any such representative…

Computers and Society · Computer Science 2021-04-06 Vijay Keswani , L. Elisa Celis

Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons

We introduce an unsupervised approach to efficiently discover the underlying features in a data set via crowdsourcing. Our queries ask crowd members to articulate a feature common to two out of three displayed examples. In addition we also…

Machine Learning · Statistics 2015-04-02 James Y. Zou , Kamalika Chaudhuri , Adam Tauman Kalai

Leveraging Unlabeled Data for Crowd Counting by Learning to Rank

We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of cropped images , we use the observation that any sub-image of a crowded scene…

Computer Vision and Pattern Recognition · Computer Science 2018-03-09 Xialei Liu , Joost van de Weijer , Andrew D. Bagdanov

Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach

Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this…

Databases · Computer Science 2023-06-27 Melika Mousavi , Nima Shahbazi , Abolfazl Asudeh

Fair Diversity Maximization with Few Representatives

Diversity maximization problem is a well-studied problem where the goal is to find $k$ diverse items. Fair diversity maximization aims to select a diverse subset of $k$ items from a large dataset, while requiring that each group of items be…

Data Structures and Algorithms · Computer Science 2025-06-11 Florian Adriaens , Nikolaj Tatti

Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling

This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value.…

Computer Vision and Pattern Recognition · Computer Science 2024-02-26 Hui Lin , Zhiheng Ma , Rongrong Ji , Yaowei Wang , Zhou Su , Xiaopeng Hong , Deyu Meng

Reducing Spatial Labeling Redundancy for Semi-supervised Crowd Counting

Labeling is onerous for crowd counting as it should annotate each individual in crowd images. Recently, several methods have been proposed for semi-supervised crowd counting to reduce the labeling efforts. Given a limited labeling budget,…

Computer Vision and Pattern Recognition · Computer Science 2021-08-09 Yongtuo Liu , Sucheng Ren , Liangyu Chai , Hanjie Wu , Jing Qin , Dan Xu , Shengfeng He

Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even…

Machine Learning · Computer Science 2021-07-23 Cody Coleman , Edward Chou , Julian Katz-Samuels , Sean Culatana , Peter Bailis , Alexander C. Berg , Robert Nowak , Roshan Sumbaly , Matei Zaharia , I. Zeki Yalniz

A Data Management Approach for Dataset Selection Using Human Computation

As the number of applications that use machine learning algorithms increases, the need for labeled data useful for training such algorithms intensifies. Getting labels typically involves employing humans to do the annotation, which directly…

Machine Learning · Computer Science 2013-07-16 Alexandros Ntoulas , Omar Alonso , Vasilis Kandylas

Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks

Supervised classification heavily depends on datasets annotated by humans. However, in subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters. Annotations have commonly been aggregated…

Computation and Language · Computer Science 2024-05-17 Negar Mokhberian , Myrl G. Marmarelis , Frederic R. Hopp , Valerio Basile , Fred Morstatter , Kristina Lerman

Generalized People Diversity: Learning a Human Perception-Aligned Diversity Representation for People Images

Capturing the diversity of people in images is challenging: recent literature tends to focus on diversifying one or two attributes, requiring expensive attribute labels or building classifiers. We introduce a diverse people image ranking…

Computer Vision and Pattern Recognition · Computer Science 2024-01-26 Hansa Srinivasan , Candice Schumann , Aradhana Sinha , David Madras , Gbolahan Oluwafemi Olanubi , Alex Beutel , Susanna Ricco , Jilin Chen

Learnt quasi-transitive similarity for retrieval from large collections of faces

We are interested in identity-based retrieval of face sets from large unlabelled collections acquired in uncontrolled environments. Given a baseline algorithm for measuring the similarity of two face sets, the meta-algorithm introduced in…

Computer Vision and Pattern Recognition · Computer Science 2016-04-15 Ognjen Arandjelovic

Null-sampling for Interpretable and Fair Representations

We propose to learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness. Invariance implies a selectivity for high level, relevant correlations w.r.t. class label annotations, and a robustness…

Machine Learning · Computer Science 2020-08-13 Thomas Kehrenberg , Myles Bartlett , Oliver Thomas , Novi Quadrianto