Related papers: Attacking Misinformation Detection Using Adversari…

Generating Valid and Natural Adversarial Examples with Large Language Models

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream…

Computation and Language · Computer Science 2023-11-21 Zimu Wang , Wei Wang , Qi Chen , Qiufeng Wang , Anh Nguyen

A Generative Adversarial Attack for Multilingual Text Classifiers

Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is…

Computation and Language · Computer Science 2024-01-17 Tom Roth , Inigo Jauregi Unanue , Alsharif Abuadbba , Massimo Piccardi

Verifying the Robustness of Automatic Credibility Assessment

Text classification methods have been widely investigated as a way to detect content of low credibility: fake news, social media bots, propaganda, etc. Quite accurate models (likely based on deep neural networks) help in moderating public…

Computation and Language · Computer Science 2026-03-04 Piotr Przybyła , Alexander Shvets , Horacio Saggion

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by…

Machine Learning · Computer Science 2021-03-03 Prashanth Vijayaraghavan , Deb Roy

Adversarial Attacks and Defenses for Social Network Text Processing Applications: Techniques, Challenges and Future Research Directions

The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing(NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these…

Computation and Language · Computer Science 2021-10-28 Izzat Alsmadi , Kashif Ahmad , Mahmoud Nazzal , Firoj Alam , Ala Al-Fuqaha , Abdallah Khreishah , Abdulelah Algosaibi

Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection Dataset

The recent success in language generation capabilities of large language models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns about their possible misuse in inducing mass agitation and communal hatred via…

Computation and Language · Computer Science 2024-01-10 Shrey Satapara , Parth Mehta , Debasis Ganguly , Sandip Modha

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

Today text classification models have been widely used. However, these classifiers are found to be easily fooled by adversarial examples. Fortunately, standard attacking methods generate adversarial texts in a pair-wise way, that is, an…

Computation and Language · Computer Science 2020-03-24 Yankun Ren , Jianbin Lin , Siliang Tang , Jun Zhou , Shuang Yang , Yuan Qi , Xiang Ren

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g.,…

Computation and Language · Computer Science 2021-09-10 Maximilian Mozes , Max Bartolo , Pontus Stenetorp , Bennett Kleinberg , Lewis D. Griffin

One word at a time: adversarial attacks on retrieval models

Adversarial examples, generated by applying small perturbations to input features, are widely used to fool classifiers and measure their robustness to noisy inputs. However, little work has been done to evaluate the robustness of ranking…

Information Retrieval · Computer Science 2020-08-06 Nisarg Raval , Manisha Verma

Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories

Counterspeech is a key strategy against harmful online content, but scaling expert-driven efforts is challenging. Large Language Models (LLMs) present a potential solution, though their use in countering conspiracy theories is…

Computation and Language · Computer Science 2025-08-04 Mareike Lisker , Christina Gottschalk , Helena Mihaljević

Viable Threat on News Reading: Generating Biased News Using Natural Language Models

Recent advancements in natural language generation has raised serious concerns. High-performance language models are widely used for language generation tasks because they are able to produce fluent and meaningful sentences. These models…

Computation and Language · Computer Science 2020-10-06 Saurabh Gupta , Huy H. Nguyen , Junichi Yamagishi , Isao Echizen

A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers

Text classifiers are vulnerable to adversarial examples -- correctly-classified examples that are deliberately transformed to be misclassified while satisfying acceptability constraints. The conventional approach to finding adversarial…

Computation and Language · Computer Science 2024-05-21 Tom Roth , Inigo Jauregi Unanue , Alsharif Abuadbba , Massimo Piccardi

Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks

The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has explored various adversarial attacks in misinformation detection, the specific…

Computation and Language · Computer Science 2025-10-13 Nouar Aldahoul , Yasir Zaki

Large Language Models and Provenance Metadata for Determining the Relevance of Images and Videos in News Stories

The most effective misinformation campaigns are multimodal, often combining text with images and videos taken out of context -- or fabricating them entirely -- to support a given narrative. Contemporary methods for detecting misinformation,…

Computation and Language · Computer Science 2025-02-17 Tomas Peterka , Matyas Bohacek

An Adversarial Benchmark for Fake News Detection Models

With the proliferation of online misinformation, fake news detection has gained importance in the artificial intelligence community. In this paper, we propose an adversarial benchmark that tests the ability of fake news detectors to reason…

Computation and Language · Computer Science 2022-01-05 Lorenzo Jaime Yu Flores , Yiding Hao

BeamAttack: Generating High-quality Textual Adversarial Examples through Beam Search and Mixed Semantic Spaces

Natural language processing models based on neural networks are vulnerable to adversarial examples. These adversarial examples are imperceptible to human readers but can mislead models to make the wrong predictions. In a black-box setting,…

Computation and Language · Computer Science 2023-03-14 Hai Zhu , Qingyang Zhao , Yuren Wu

Identifying Human Strategies for Generating Word-Level Adversarial Examples

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality.…

Computation and Language · Computer Science 2022-10-24 Maximilian Mozes , Bennett Kleinberg , Lewis D. Griffin

MisinfoEval: Generative AI in the Era of "Alternative Facts"

The spread of misinformation on social media platforms threatens democratic processes, contributes to massive economic losses, and endangers public health. Many efforts to address misinformation focus on a knowledge deficit model and…

Computation and Language · Computer Science 2024-10-16 Saadia Gabriel , Liang Lyu , James Siderius , Marzyeh Ghassemi , Jacob Andreas , Asu Ozdaglar

Adversarial Reframing: A Framework for Targeted Generation in Language Models

Large Language Models (LLMs) are widely deployed in diverse real-world settings, yet remain vulnerable to jailbreaking, where prompt-based attacks bypass safety filters. We present THREAT (Targeted Harmful generation via Reframing and…

Cryptography and Security · Computer Science 2026-05-22 Shahnewaz Karim Sakib , Swati Kar , Anindya Bijoy Das

Adversarial Math Word Problem Generation

Large language models (LLMs) have significantly transformed the educational landscape. As current plagiarism detection tools struggle to keep pace with LLMs' rapid advancements, the educational community faces the challenge of assessing…

Computation and Language · Computer Science 2024-06-18 Roy Xie , Chengxuan Huang , Junlin Wang , Bhuwan Dhingra