Related papers: Logically Consistent Adversarial Attacks for Soft …

Robust Deep Learning Models Against Semantic-Preserving Adversarial Attack

Deep learning models can be fooled by small $l_p$-norm adversarial perturbations and natural perturbations in terms of attributes. Although the robustness against each perturbation has been explored, it remains a challenge to address the…

Machine Learning · Computer Science 2023-04-11 Dashan Gao , Yunce Zhao , Yinghua Yao , Zeqi Zhang , Bifei Mao , Xin Yao

Adversarial Attack and Defense of Structured Prediction Models

Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on…

Computation and Language · Computer Science 2020-10-20 Wenjuan Han , Liwen Zhang , Yong Jiang , Kewei Tu

L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks

In the rapidly evolving field of machine learning, adversarial attacks present a significant challenge to model robustness and security. Decision-based attacks, which only require feedback on the decision of a model rather than detailed…

Cryptography and Security · Computer Science 2024-05-24 Ping Guo , Fei Liu , Xi Lin , Qingchuan Zhao , Qingfu Zhang

Adversarial Robustness of Vision in Open Foundation Models

With the increase in deep learning, it becomes increasingly difficult to understand the model in which AI systems can identify objects. Thus, an adversary could aim to modify an image by adding unseen elements, which will confuse the AI in…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Jonathon Fox , William J Buchanan , Pavlos Papadopoulos

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in…

Artificial Intelligence · Computer Science 2023-06-30 Edoardo Mosca , Shreyash Agarwal , Javier Rando , Georg Groh

Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness

Insensitivity to semantically-preserving variations of prompts (paraphrases) is crucial for reliable behavior and real-world deployment of large language models. However, language models exhibit significant performance degradation when…

Computation and Language · Computer Science 2025-03-04 Tingchen Fu , Fazl Barez

Layer-wise Regularized Adversarial Training using Layers Sustainability Analysis (LSA) framework

Deep neural network models are used today in various applications of artificial intelligence, the strengthening of which, in the face of adversarial attacks is of particular importance. An appropriate solution to adversarial attacks is…

Computer Vision and Pattern Recognition · Computer Science 2022-02-16 Mohammad Khalooei , Mohammad Mehdi Homayounpour , Maryam Amirmazlaghani

Defending Adversarial Attacks by Correcting logits

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning. While previous research verified that adversarial attacks are often fragile and can be defended via image-level processing, it…

Machine Learning · Computer Science 2019-06-27 Yifeng Li , Lingxi Xie , Ya Zhang , Rui Zhang , Yanfeng Wang , Qi Tian

A Visual Analytics Framework for Adversarial Text Generation

This paper presents a framework which enables a user to more easily make corrections to adversarial texts. While attack algorithms have been demonstrated to automatically build adversaries, changes made by the algorithms can often have poor…

Human-Computer Interaction · Computer Science 2020-12-21 Brandon Laughlin , Christopher Collins , Karthik Sankaranarayanan , Khalil El-Khatib

Towards Adversarially Robust Continual Learning

Recent studies show that models trained by continual learning can achieve the comparable performances as the standard supervised learning and the learning flexibility of continual learning models enables their wide applications in the real…

Machine Learning · Computer Science 2023-04-03 Tao Bai , Chen Chen , Lingjuan Lyu , Jun Zhao , Bihan Wen

How adversarial attacks can disrupt seemingly stable accurate classifiers

Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are…

Machine Learning · Computer Science 2024-09-13 Oliver J. Sutton , Qinghua Zhou , Ivan Y. Tyukin , Alexander N. Gorban , Alexander Bastounis , Desmond J. Higham

KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations

While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and alleviating inconsistencies among…

Computation and Language · Computer Science 2023-06-06 Myeongjun Jang , Bodhisattwa Prasad Majumder , Julian McAuley , Thomas Lukasiewicz , Oana-Maria Camburu

Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms

Current adversarial robustness methods for large language models require extensive datasets of harmful prompts (thousands to hundreds of thousands of examples), yet remain vulnerable to novel attack vectors and distributional shifts. We…

Artificial Intelligence · Computer Science 2026-05-12 Linh Le , David Williams-King , Mohamed Amine Merzouk , Aton Kamanda , Adam Oberman

QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models

In typical multimodal tasks, such as Visual Question Answering (VQA), adversarial attacks targeting a specific image and question can lead large vision-language models (LVLMs) to provide incorrect answers. However, it is common for a single…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Yudong Zhang , Ruobing Xie , Jiansheng Chen , Xingwu Sun , Zhanhui Kang , Yu Wang

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial…

Computation and Language · Computer Science 2021-07-26 Ivan Fursov , Alexey Zaytsev , Pavel Burnyshev , Ekaterina Dmitrieva , Nikita Klyuchnikov , Andrey Kravchenko , Ekaterina Artemova , Evgeny Burnaev

A Generative Adversarial Attack for Multilingual Text Classifiers

Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is…

Computation and Language · Computer Science 2024-01-17 Tom Roth , Inigo Jauregi Unanue , Alsharif Abuadbba , Massimo Piccardi

Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction

With the rapid advancement and widespread application of vision-language pre-training (VLP) models, their vulnerability to adversarial attacks has become a critical concern. In general, the adversarial examples can typically be designed to…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Yuanbo Li , Tianyang Xu , Cong Hu , Tao Zhou , Xiao-Jun Wu , Josef Kittler

Advancing Adversarial Robustness Through Adversarial Logit Update

Deep Neural Networks are susceptible to adversarial perturbations. Adversarial training and adversarial purification are among the most widely recognized defense strategies. Although these methods have different underlying logic, both rely…

Machine Learning · Computer Science 2023-08-30 Hao Xuan , Peican Zhu , Xingyu Li

Fooling the Textual Fooler via Randomizing Latent Representations

Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial…

Computation and Language · Computer Science 2024-06-11 Duy C. Hoang , Quang H. Nguyen , Saurav Manchanda , MinLong Peng , Kok-Seng Wong , Khoa D. Doan

Adversarial Math Word Problem Generation

Large language models (LLMs) have significantly transformed the educational landscape. As current plagiarism detection tools struggle to keep pace with LLMs' rapid advancements, the educational community faces the challenge of assessing…

Computation and Language · Computer Science 2024-06-18 Roy Xie , Chengxuan Huang , Junlin Wang , Bhuwan Dhingra