Related papers: HateCheck: Functional Tests for Hate Speech Detect…

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and…

Computation and Language · Computer Science 2022-06-22 Paul Röttger , Haitham Seelawi , Debora Nozza , Zeerak Talat , Bertie Vidgen

Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection

Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard…

Computation and Language · Computer Science 2022-07-05 Pedro Henrique Luz de Araujo , Benjamin Roth

GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?

Online hate detection suffers from biases incurred in data sampling, annotation, and model pre-training. Therefore, measuring the averaged performance over all examples in held-out test data is inadequate. Instead, we must identify specific…

Computation and Language · Computer Science 2024-05-28 Yiping Jin , Leo Wanner , Alexander Shvets

HateCheckHIn: Evaluating Hindi Hate Speech Detection Models

Due to the sheer volume of online hate, the AI and NLP communities have started building models to detect such hateful content. Recently, multilingual hate is a major emerging challenge for automated detection where code-mixing or more than…

Computation and Language · Computer Science 2022-05-12 Mithun Das , Punyajoy Saha , Binny Mathew , Animesh Mukherjee

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Hate speech is a severe issue that affects many online platforms. So far, several studies have been performed to develop robust hate speech detection systems. Large language models like ChatGPT have recently shown a great promise in…

Computation and Language · Computer Science 2023-05-24 Mithun Das , Saurabh Kumar Pandey , Animesh Mukherjee

DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?

When building a predictive model, it is often difficult to ensure that application-specific requirements are encoded by the model that will eventually be deployed. Consider researchers working on hate speech detection. They will have an…

Computation and Language · Computer Science 2025-01-14 Urja Khurana , Eric Nalisnick , Antske Fokkens

Deep Learning for Hate Speech Detection: A Comparative Study

Automated hate speech detection is an important tool in combating the spread of hate speech, particularly in social media. Numerous methods have been developed for the task, including a recent proliferation of deep-learning based…

Computation and Language · Computer Science 2023-12-08 Jitendra Singh Malik , Hezhe Qiao , Guansong Pang , Anton van den Hengel

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing…

Computation and Language · Computer Science 2024-05-06 Ri Chi Ng , Nirmalendu Prakash , Ming Shan Hee , Kenny Tsu Wei Choo , Roy Ka-Wei Lee

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Detecting online hate is a complex task, and low-performing models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is an emerging challenge for automated detection. We present…

Computation and Language · Computer Science 2022-05-09 Hannah Rose Kirk , Bertram Vidgen , Paul Röttger , Tristan Thrush , Scott A. Hale

All You Need is "Love": Evading Hate-speech Detection

With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work,…

Computation and Language · Computer Science 2018-11-06 Tommi Gröndahl , Luca Pajola , Mika Juuti , Mauro Conti , N. Asokan

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages…

Computation and Language · Computer Science 2026-03-18 Ri Chi Ng , Aditi Kumaresan , Yujia Hu , Roy Ka-Wei Lee

HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations

Hateful speech detection is a key component of content moderation, yet current evaluation frameworks rarely assess why a text is deemed hateful. We introduce \textsf{HateXScore}, a four-component metric suite designed to evaluate the…

Computation and Language · Computer Science 2026-01-21 Yujia Hu , Roy Ka-Wei Lee

Advancing Hate Speech Detection with Transformers: Insights from the MetaHate

Hate speech is a widespread and harmful form of online discourse, encompassing slurs and defamatory posts that can have serious social, psychological, and sometimes physical impacts on targeted individuals and communities. As social media…

Machine Learning · Computer Science 2025-08-08 Santosh Chapagain , Shah Muhammad Hamdi , Soukaina Filali Boubrahimi

HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

Optimization of offensive content moderation models for different types of hateful messages is typically achieved through continued pre-training or fine-tuning on new hate speech benchmarks. However, existing benchmarks mainly address…

Computation and Language · Computer Science 2026-04-07 Irina Proskurina , Marc-Antoine Carpentier , Julien Velcin

Empirical Evaluation of Public HateSpeech Datasets

Despite the extensive communication benefits offered by social media platforms, numerous challenges must be addressed to ensure user safety. One of the most significant risks faced by users on these platforms is targeted hate speech. Social…

Computation and Language · Computer Science 2024-07-18 Sadar Jaf , Basel Barakat

Towards generalisable hate speech detection: a review on obstacles and solutions

Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation.…

Computation and Language · Computer Science 2021-02-18 Wenjie Yin , Arkaitz Zubiaga

Detecting Online Hate Speech Using Context Aware Models

In the wake of a polarizing election, the cyber world is laden with hate speech. Context accompanying a hate speech text is useful for identifying hate speech, which however has been largely overlooked in existing datasets and hate speech…

Computation and Language · Computer Science 2018-05-23 Lei Gao , Ruihong Huang

DeepHate: Hate Speech Detection via Multi-Faceted Text Representations

Online hate speech is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have developed many traditional machine…

Computation and Language · Computer Science 2021-03-23 Rui Cao , Roy Ka-Wei Lee , Tuan-Anh Hoang

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this…

Computation and Language · Computer Science 2022-04-13 Binny Mathew , Punyajoy Saha , Seid Muhie Yimam , Chris Biemann , Pawan Goyal , Animesh Mukherjee

Efficient Hate Speech Detection: Evaluating 38 Models from Traditional Methods to Transformers

The proliferation of hate speech on social media necessitates automated detection systems that balance accuracy with computational efficiency. This study evaluates 38 model configurations in detecting hate speech across datasets ranging…

Computation and Language · Computer Science 2025-09-19 Mahmoud Abusaqer , Jamil Saquer , Hazim Shatnawi