Related papers: MultiRobustBench: Benchmarking Robustness Against …

RobustBench: a standardized adversarial robustness benchmark

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models. A key challenge in benchmarking…

Machine Learning · Computer Science 2021-11-02 Francesco Croce , Maksym Andriushchenko , Vikash Sehwag , Edoardo Debenedetti , Nicolas Flammarion , Mung Chiang , Prateek Mittal , Matthias Hein

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

As large language models (LLMs) grow more powerful, ensuring their safety against misuse becomes crucial. While researchers have focused on developing robust defenses, no method has yet achieved complete invulnerability to attacks. We…

Computation and Language · Computer Science 2024-11-13 Alwin Peng , Julian Michael , Henry Sleight , Ethan Perez , Mrinank Sharma

Adversarial Robustness in Unsupervised Machine Learning: A Systematic Review

As the adoption of machine learning models increases, ensuring robust models against adversarial attacks is increasingly important. With unsupervised machine learning gaining more attention, ensuring it is robust against attacks is vital.…

Machine Learning · Computer Science 2023-06-02 Mathias Lundteigen Mohus , Jinyue Li

TabularBench: Benchmarking Adversarial Robustness for Tabular Deep Learning in Real-world Use-cases

While adversarial robustness in computer vision is a mature research field, fewer researchers have tackled the evasion attacks against tabular deep learning, and even fewer investigated robustification mechanisms and reliable defenses. We…

Machine Learning · Computer Science 2024-08-15 Thibault Simonetto , Salah Ghamizi , Maxime Cordy

A Comprehensive Evaluation Framework for Deep Model Robustness

Deep neural networks (DNNs) have achieved remarkable performance across a wide range of applications, while they are vulnerable to adversarial examples, which motivates the evaluation and benchmark of model robustness. However, current…

Computer Vision and Pattern Recognition · Computer Science 2022-11-02 Jun Guo , Wei Bao , Jiakai Wang , Yuqing Ma , Xinghai Gao , Gang Xiao , Aishan Liu , Jian Dong , Xianglong Liu , Wenjun Wu

MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks

Despite being popularly used in many applications, neural network models have been found to be vulnerable to adversarial examples, i.e., carefully crafted examples aiming to mislead machine learning models. Adversarial examples can pose…

Machine Learning · Computer Science 2019-07-30 Siwakorn Srisakaokul , Yuhao Zhang , Zexuan Zhong , Wei Yang , Tao Xie , Bo Li

Survey of Adversarial Robustness in Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence by facilitating integrated understanding across diverse modalities, including text, images, video, audio, and speech. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Chengze Jiang , Zhuangzhuang Wang , Minjing Dong , Jie Gui

On The Empirical Effectiveness of Unrealistic Adversarial Hardening Against Realistic Adversarial Attacks

While the literature on security attacks and defense of Machine Learning (ML) systems mostly focuses on unrealistic adversarial examples, recent research has raised concern about the under-explored field of realistic adversarial attacks and…

Machine Learning · Computer Science 2023-05-23 Salijona Dyrmishi , Salah Ghamizi , Thibault Simonetto , Yves Le Traon , Maxime Cordy

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypass safety alignment and induce harmful behaviors.…

Cryptography and Security · Computer Science 2025-12-09 Xiaojun Jia , Jie Liao , Qi Guo , Teng Ma , Simeng Qin , Ranjie Duan , Tianlin Li , Yihao Huang , Zhitao Zeng , Dongxian Wu , Yiming Li , Wenqi Ren , Xiaochun Cao , Yang Liu

$\textit{MMJ-Bench}$: A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models

As deep learning advances, Large Language Models (LLMs) and their multimodal counterparts, Multimodal Large Language Models (MLLMs), have shown exceptional performance in many real-world tasks. However, MLLMs face significant security…

Cryptography and Security · Computer Science 2024-10-23 Fenghua Weng , Yue Xu , Chengyan Fu , Wenjie Wang

Testing Robustness Against Unforeseen Adversaries

Adversarial robustness research primarily focuses on L_p perturbations, and most defenses are developed with identical training-time and test-time adversaries. However, in real-world applications developers are unlikely to have access to…

Machine Learning · Computer Science 2023-10-31 Max Kaufmann , Daniel Kang , Yi Sun , Steven Basart , Xuwang Yin , Mantas Mazeika , Akul Arora , Adam Dziedzic , Franziska Boenisch , Tom Brown , Jacob Steinhardt , Dan Hendrycks

Multi-objective Search of Robust Neural Architectures against Multiple Types of Adversarial Attacks

Many existing deep learning models are vulnerable to adversarial examples that are imperceptible to humans. To address this issue, various methods have been proposed to design network architectures that are robust to one particular type of…

Machine Learning · Computer Science 2021-01-19 Jia Liu , Yaochu Jin

RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant…

Machine Learning · Computer Science 2026-02-18 Mohamed Djilani , Salah Ghamizi , Maxime Cordy

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models

With the increasing use of large language models (LLMs), ensuring reliable performance in diverse, real-world environments is essential. Despite their remarkable achievements, LLMs often struggle with adversarial inputs, significantly…

Computation and Language · Computer Science 2024-06-18 Yuqing Wang , Yun Zhao

Adversarial Robustness Against the Union of Multiple Perturbation Models

Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers. While most work has defended against a single type of…

Machine Learning · Computer Science 2020-07-30 Pratyush Maini , Eric Wong , J. Zico Kolter

Benchmarking Adversarial Robustness

Deep neural networks are vulnerable to adversarial examples, which becomes one of the most important research problems in the development of deep learning. While a lot of efforts have been made in recent years, it is of great significance…

Computer Vision and Pattern Recognition · Computer Science 2019-12-30 Yinpeng Dong , Qi-An Fu , Xiao Yang , Tianyu Pang , Hang Su , Zihao Xiao , Jun Zhu

Adversarial Training and Robustness for Multiple Perturbations

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small $\ell_\infty$-noise). For other perturbations, these defenses offer no guarantees and, at times, even…

Machine Learning · Computer Science 2019-10-21 Florian Tramèr , Dan Boneh

Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input

Multimodal Large Language Models (MLLMs) increasingly support dynamic image resolutions. However, current evaluation paradigms primarily assess semantic performance, overlooking the critical question of resolution robustness - whether…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Chenxu Li , Zhicai Wang , Yuan Sheng , Xingyu Zhu , Yanbin Hao , Xiang Wang

Robustness of Large Language Models Against Adversarial Attacks

The increasing deployment of Large Language Models (LLMs) in various applications necessitates a rigorous evaluation of their robustness against adversarial attacks. In this paper, we present a comprehensive study on the robustness of GPT…

Computation and Language · Computer Science 2024-12-24 Yiyi Tao , Yixian Shen , Hang Zhang , Yanxin Shen , Lun Wang , Chuanqi Shi , Shaoshuai Du

Defenses in Adversarial Machine Learning: A Survey

Adversarial phenomenon has been widely observed in machine learning (ML) systems, especially in those using deep neural networks, describing that ML systems may produce inconsistent and incomprehensible predictions with humans at some…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Baoyuan Wu , Shaokui Wei , Mingli Zhu , Meixi Zheng , Zihao Zhu , Mingda Zhang , Hongrui Chen , Danni Yuan , Li Liu , Qingshan Liu