English
Related papers

Related papers: Annotation Sensitivity: Training Data Collection M…

200 papers

Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper,…

Computation and Language · Computer Science 2021-06-07 Elizabeth Excell , Noura Al Moubayed

The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to…

Computation and Language · Computer Science 2025-06-13 Tommaso Giorgi , Lorenzo Cima , Tiziano Fagni , Marco Avvenuti , Stefano Cresci

In this work, we explore the capability of Large Language Models (LLMs) to annotate hate speech and abusiveness while considering predefined annotator personas within the strong-to-weak data perspectivism spectra. We evaluated LLM-generated…

Computation and Language · Computer Science 2025-08-26 Olufunke O. Sarumi , Charles Welch , Daniel Braun , Jörg Schlötterer

Since state-of-the-art approaches to offensive language detection rely on supervised learning, it is crucial to quickly adapt them to the continuously evolving scenario of social media. While several approaches have been proposed to tackle…

Computation and Language · Computer Science 2022-10-17 Elisa Leonardelli , Stefano Menini , Alessio Palmero Aprosio , Marco Guerini , Sara Tonelli

Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The…

Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as…

Computation and Language · Computer Science 2023-01-11 Wenjie Yin , Vibhor Agarwal , Aiqi Jiang , Arkaitz Zubiaga , Nishanth Sastry

Social stereotypes negatively impact individuals' judgements about different groups and may have a critical role in how people understand language directed toward minority social groups. Here, we assess the role of social stereotypes in the…

Computation and Language · Computer Science 2021-10-29 Aida Mostafazadeh Davani , Mohammad Atari , Brendan Kennedy , Morteza Dehghani

Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences in opinion across groups, not noise. Thus,…

Computation and Language · Computer Science 2024-03-19 Eve Fleisig , Rediet Abebe , Dan Klein

Crowdsourcing has been the prevalent paradigm for creating natural language understanding datasets in recent years. A common crowdsourcing practice is to recruit a small number of high-quality workers, and have them massively generate…

Computation and Language · Computer Science 2019-08-29 Mor Geva , Yoav Goldberg , Jonathan Berant

Hate speech detection is a socially sensitive and inherently subjective task, with judgments often varying based on personal traits. While prior work has examined how socio-demographic factors influence annotation, the impact of personality…

Computation and Language · Computer Science 2025-06-11 Shuzhou Yuan , Ercong Nie , Mario Tawfelis , Helmut Schmid , Hinrich Schütze , Michael Färber

Crowdsourced annotation is vital to both collecting labelled data to train and test automated content moderation systems and to support human-in-the-loop review of system decisions. However, annotation tasks such as judging hate speech are…

Human-Computer Interaction · Computer Science 2023-09-06 Danula Hettiachchi , Indigo Holcombe-James , Stephanie Livingstone , Anjalee de Silva , Matthew Lease , Flora D. Salim , Mark Sanderson

Hate speech detection is a crucial task, especially on social media, where harmful content can spread quickly. Implementing machine learning models to automatically identify and address hate speech is essential for mitigating its impact and…

Computation and Language · Computer Science 2025-08-19 Somaiyeh Dehghan , Mehmet Umut Sen , Berrin Yanikoglu

Human data annotation, especially when involving experts, is often treated as an objective reference. However, many annotation tasks are inherently subjective, and annotators' judgments may evolve over time. This study investigates changes…

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the…

Computation and Language · Computer Science 2021-08-31 Igor Mozetic , Miha Grcar , Jasmina Smailovic

It is common practice in text classification to only use one majority label for model training even if a dataset has been annotated by multiple annotators. Doing so can remove valuable nuances and diverse perspectives inherent in the…

Computation and Language · Computer Science 2024-09-27 Jin Xu , Mariët Theune , Daniel Braun

Annotators exhibit disagreement during data labeling, which can be termed as annotator label uncertainty. Annotator label uncertainty manifests in variations of labeling quality. Training with a single low-quality annotation per sample…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Chen Zhou , Mohit Prabhushankar , Ghassan AlRegib

Human annotations are an important source of information in the development of natural language understanding approaches. As under the pressure of productivity annotators can assign different labels to a given text, the quality of produced…

Computation and Language · Computer Science 2020-10-29 Kristian Miok , Gregor Pirs , Marko Robnik-Sikonja

Annotation guidelines used to guide the annotation of training and evaluation datasets can have a considerable impact on the quality of machine learning models. In this study, we explore the effects of annotation guidelines on the quality…

Information Retrieval · Computer Science 2018-10-15 Faiz Ali Shah , Kairit Sirts , Dietmar Pfahl

This study uses the cosine similarity ratio, embedding regression, and manual re-annotation to diagnose hate speech classification. We begin by computing cosine similarity ratio on a dataset "Measuring Hate Speech" that contains 135,556…

Computation and Language · Computer Science 2024-11-27 Xilin Yang

Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context is the occurrence of annotation errors, as their effects can degrade model…

Machine Learning · Computer Science 2024-09-27 Heinrich Peters , Alireza Hashemi , James Rae
‹ Prev 1 2 3 10 Next ›