Related papers: Adversarial Training for Large Neural Language Mod…

In and Out-of-Domain Text Adversarial Robustness via Label Smoothing

Recently it has been shown that state-of-the-art NLP models are vulnerable to adversarial attacks, where the predictions of a model can be drastically altered by slight modifications to the input (such as synonym substitutions). While…

Computation and Language · Computer Science 2023-07-13 Yahan Yang , Soham Dan , Dan Roth , Insup Lee

Adversarial Training in Low-Label Regimes with Margin-Based Interpolation

Adversarial training has emerged as an effective approach to train robust neural network models that are resistant to adversarial attacks, even in low-label regimes where labeled data is scarce. In this paper, we introduce a novel…

Machine Learning · Computer Science 2024-11-28 Tian Ye , Rajgopal Kannan , Viktor Prasanna

Towards Deep Learning Models Resistant to Large Perturbations

Adversarial robustness has proven to be a required property of machine learning algorithms. A key and often overlooked aspect of this problem is to try to make the adversarial noise magnitude as large as possible to enhance the benefits of…

Machine Learning · Statistics 2020-03-31 Amirreza Shaeiri , Rozhin Nobahari , Mohammad Hossein Rohban

Improved Generalization Bounds for Adversarially Robust Learning

We consider a model of robust learning in an adversarial environment. The learner gets uncorrupted training data with access to possible corruptions that may be affected by the adversary during testing. The learner's goal is to build a…

Machine Learning · Computer Science 2022-07-04 Idan Attias , Aryeh Kontorovich , Yishay Mansour

Average Margin Regularization for Classifiers

Adversarial robustness has become an important research topic given empirical demonstrations on the lack of robustness of deep neural networks. Unfortunately, recent theoretical results suggest that adversarial training induces a strict…

Machine Learning · Computer Science 2020-03-25 Matt Olfat , Anil Aswani

Learning Mechanism Underlying NLP Pre-Training and Fine-Tuning

Natural language processing (NLP) enables the understanding and generation of meaningful human language, typically using a pre-trained complex architecture on a large dataset to learn the language and next fine-tune its weights to implement…

Computation and Language · Computer Science 2025-09-04 Yarden Tzach , Ronit D. Gross , Ella Koresh , Shalom Rosner , Or Shpringer , Tal Halevi , Ido Kanter

Second Order Optimization for Adversarial Robustness and Interpretability

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique aimed at learning features robust to such attacks and is widely regarded as a very effective defense.…

Machine Learning · Computer Science 2020-09-11 Theodoros Tsiligkaridis , Jay Roberts

Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification

Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving, but adversarial attacks designed to mislead RL systems remain challenging. Existing approaches often rely on modifying the…

Machine Learning · Computer Science 2025-07-25 Junyong Jiang , Buwei Tian , Chenxing Xu , Songze Li , Lu Dong

Failure Cases Are Better Learned But Boundary Says Sorry: Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training

Adversarial Training (AT) is one of the most effective methods to train robust Deep Neural Networks (DNNs). However, AT creates an inherent trade-off between clean accuracy and adversarial robustness, which is commonly attributed to the…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Yanyun Wang , Li Liu

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the…

Machine Learning · Statistics 2021-03-31 Sven Gowal , Chongli Qin , Jonathan Uesato , Timothy Mann , Pushmeet Kohli

Improving the Generalization of Adversarial Training with Domain Adaptation

By injecting adversarial examples into training data, adversarial training is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial…

Machine Learning · Computer Science 2019-03-18 Chuanbiao Song , Kun He , Liwei Wang , John E. Hopcroft

Adversarial Margin Maximization Networks

The tremendous recent success of deep neural networks (DNNs) has sparked a surge of interest in understanding their predictive ability. Unlike the human visual system which is able to generalize robustly and learn with little supervision,…

Machine Learning · Computer Science 2019-11-15 Ziang Yan , Yiwen Guo , Changshui Zhang

Adversarial Training: embedding adversarial perturbations into the parameter space of a neural network to build a robust system

Adversarial training, in which a network is trained on both adversarial and clean examples, is one of the most trusted defense methods against adversarial attacks. However, there are three major practical difficulties in implementing and…

Machine Learning · Computer Science 2019-10-11 Shixian Wen , Laurent Itti

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs

Despite recent efforts in Large Language Model (LLM) safety and alignment, current adversarial attacks on frontier LLMs can still consistently force harmful generations. Although adversarial training has been widely studied and shown to…

Machine Learning · Computer Science 2025-10-29 Csaba Dékány , Stefan Balauca , Robin Staab , Dimitar I. Dimitrov , Martin Vechev

Adversarial Training for Machine Reading Comprehension with Virtual Embeddings

Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited.…

Computation and Language · Computer Science 2021-11-29 Ziqing Yang , Yiming Cui , Chenglei Si , Wanxiang Che , Ting Liu , Shijin Wang , Guoping Hu

Fight Back Against Jailbreaking via Prompt Adversarial Tuning

While Large Language Models (LLMs) have achieved tremendous success in various applications, they are also susceptible to jailbreaking attacks. Several primary defense strategies have been proposed to protect LLMs from producing harmful…

Machine Learning · Computer Science 2024-11-01 Yichuan Mo , Yuji Wang , Zeming Wei , Yisen Wang

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples

Recent advances in the development of large language models have resulted in public access to state-of-the-art pre-trained language models (PLMs), including Generative Pre-trained Transformer 3 (GPT-3) and Bidirectional Encoder…

Computation and Language · Computer Science 2022-09-07 Hezekiah J. Branch , Jonathan Rodriguez Cefalu , Jeremy McHugh , Leyla Hujer , Aditya Bahl , Daniel del Castillo Iglesias , Ron Heichman , Ramesh Darwishi

Efficient Adversarial Training with Robust Early-Bird Tickets

Adversarial training is one of the most powerful methods to improve the robustness of pre-trained language models (PLMs). However, this approach is typically more expensive than traditional fine-tuning because of the necessity to generate…

Computation and Language · Computer Science 2022-12-01 Zhiheng Xi , Rui Zheng , Tao Gui , Qi Zhang , Xuanjing Huang

A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement

Recent years have seen the wide application of NLP models in crucial areas such as finance, medical treatment, and news media, raising concerns of the model robustness and vulnerabilities. In this paper, we propose a novel prompt-based…

Computation and Language · Computer Science 2022-03-22 Yuting Yang , Pei Huang , Juan Cao , Jintao Li , Yun Lin , Jin Song Dong , Feifei Ma , Jian Zhang

Partially Recentralization Softmax Loss for Vision-Language Models Robustness

As Large Language Models make a breakthrough in natural language processing tasks (NLP), multimodal technique becomes extremely popular. However, it has been shown that multimodal NLP are vulnerable to adversarial attacks, where the outputs…

Computation and Language · Computer Science 2026-03-16 Hao Wang , Jinzhe Jiang , Xin Zhang , Chen Li