Related papers: Annotation Sensitivity: Training Data Collection M…

Towards Equal Gender Representation in the Annotations of Toxic Language Detection

Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper,…

Computation and Language · Computer Science 2021-06-07 Elizabeth Excell , Noura Al Moubayed

Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets

The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to…

Computation and Language · Computer Science 2025-06-13 Tommaso Giorgi , Lorenzo Cima , Tiziano Fagni , Marco Avvenuti , Stefano Cresci

The Impact of Annotator Personas on LLM Behavior Across the Perspectivism Spectrum

In this work, we explore the capability of Large Language Models (LLMs) to annotate hate speech and abusiveness while considering predefined annotator personas within the strong-to-weak data perspectivism spectra. We evaluated LLM-generated…

Computation and Language · Computer Science 2025-08-26 Olufunke O. Sarumi , Charles Welch , Daniel Braun , Jörg Schlötterer

Agreeing to Disagree: Annotating Offensive Language Datasets with Annotators' Disagreement

Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle…

Computation and Language · Computer Science 2022-10-17 Elisa Leonardelli , Stefano Menini , Alessio Palmero Aprosio , Marco Guerini , Sara Tonelli

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The…

Computation and Language · Computer Science 2024-11-19 Amit Das , Zheng Zhang , Najib Hasan , Souvika Sarkar , Fatemeh Jamshidi , Tathagata Bhattacharya , Mostafa Rahgouy , Nilanjana Raychawdhary , Dongji Feng , Vinija Jain , Aman Chadha , Mary Sandage , Lauramarie Pope , Gerry Dozier , Cheryl Seals

AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection

Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as…

Computation and Language · Computer Science 2023-01-11 Wenjie Yin , Vibhor Agarwal , Aiqi Jiang , Arkaitz Zubiaga , Nishanth Sastry

Hate Speech Classifiers Learn Human-Like Social Stereotypes

Social stereotypes negatively impact individuals' judgements about different groups and may have a critical role in how people understand language directed toward minority social groups. Here, we assess the role of social stereotypes in the…

Computation and Language · Computer Science 2021-10-29 Aida Mostafazadeh Davani , Mohammad Atari , Brendan Kennedy , Morteza Dehghani

When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks

Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences in opinion across groups, not noise. Thus,…

Computation and Language · Computer Science 2024-03-19 Eve Fleisig , Rediet Abebe , Dan Klein

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant

Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models

Hate speech detection is a socially sensitive and inherently subjective task, with judgments often varying based on personal traits. While prior work has examined how socio-demographic factors influence annotation, the impact of personality…

Computation and Language · Computer Science 2025-06-11 Shuzhou Yuan , Ercong Nie , Mario Tawfelis , Helmut Schmid , Hinrich Schütze , Michael Färber

How Crowd Worker Factors Influence Subjective Annotations: A Study of Tagging Misogynistic Hate Speech in Tweets

Crowdsourced annotation is vital to both collecting labelled data to train and test automated content moderation systems and to support human-in-the-loop review of system decisions. However, annotation tasks such as judging hate speech are…

Human-Computer Interaction · Computer Science 2023-09-06 Danula Hettiachchi , Indigo Holcombe-James , Stephanie Livingstone , Anjalee de Silva , Matthew Lease , Flora D. Salim , Mark Sanderson

Dealing with Annotator Disagreement in Hate Speech Classification

Hate speech detection is a crucial task, especially on social media, where harmful content can spread quickly. Implementing machine learning models to automatically identify and address hate speech is essential for mitigating its impact and…

Computation and Language · Computer Science 2025-08-19 Somaiyeh Dehghan , Mehmet Umut Sen , Berrin Yanikoglu

How Annotation Trains Annotators: Competence Development in Social Influence Recognition

Human data annotation, especially when involving experts, is often treated as an objective reference. However, many annotation tasks are inherently subjective, and annotators' judgments may evolve over time. This study investigates changes…

Computation and Language · Computer Science 2026-04-06 Maciej Markiewicz , Beata Bajcar , Wiktoria Mieleszczenko-Kowszewicz , Aleksander Szczęsny , Tomasz Adamczyk , Grzegorz Chodak , Karolina Ostrowska , Aleksandra Sawczuk , Jolanta Babiak , Jagoda Szklarczyk , Przemysław Kazienko

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the…

Computation and Language · Computer Science 2021-08-31 Igor Mozetic , Miha Grcar , Jasmina Smailovic

Leveraging Annotator Disagreement for Text Classification

It is common practice in text classification to only use one majority label for model training even if a dataset has been annotated by multiple annotators. Doing so can remove valuable nuances and diverse perspectives inherent in the…

Computation and Language · Computer Science 2024-09-27 Jin Xu , Mariët Theune , Daniel Braun

Perceptual Quality-based Model Training under Annotator Label Uncertainty

Annotators exhibit disagreement during data labeling, which can be termed as annotator label uncertainty. Annotator label uncertainty manifests in variations of labeling quality. Training with a single low-quality annotation per sample…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Chen Zhou , Mohit Prabhushankar , Ghassan AlRegib

Bayesian Methods for Semi-supervised Text Annotation

Human annotations are an important source of information in the development of natural language understanding approaches. As under the pressure of productivity annotators can assign different labels to a given text, the quality of produced…

Computation and Language · Computer Science 2020-10-29 Kristian Miok , Gregor Pirs , Marko Robnik-Sikonja

The Impact of Annotation Guidelines and Annotated Data on Extracting App Features from App Reviews

Annotation guidelines used to guide the annotation of training and evaluation datasets can have a considerable impact on the quality of machine learning models. In this study, we explore the effects of annotation guidelines on the quality…

Information Retrieval · Computer Science 2018-10-15 Faiz Ali Shah , Kairit Sirts , Dietmar Pfahl

Diagnosing Hate Speech Classification: Where Do Humans and Machines Disagree, and Why?

This study uses the cosine similarity ratio, embedding regression, and manual re-annotation to diagnose hate speech classification. We begin by computing cosine similarity ratio on a dataset "Measuring Hate Speech" that contains 135,556…

Computation and Language · Computer Science 2024-11-27 Xilin Yang

Generalizable Error Modeling for Human Data Annotation: Evidence From an Industry-Scale Search Data Annotation Program

Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context is the occurrence of annotation errors, as their effects can degrade model…

Machine Learning · Computer Science 2024-09-27 Heinrich Peters , Alireza Hashemi , James Rae