English
Related papers

Related papers: Adversarial Training for Large Neural Language Mod…

200 papers

To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies. Many of the recent successes have come from methods that pre-specify the types of…

Machine Learning · Computer Science 2022-12-14 Tejas Gokhale , Rushil Anirudh , Jayaraman J. Thiagarajan , Bhavya Kailkhura , Chitta Baral , Yezhou Yang

Pretrained models from self-supervision are prevalently used in fine-tuning downstream tasks faster or for better accuracy. However, gaining robustness from pretraining is left unexplored. We introduce adversarial training into…

Computer Vision and Pattern Recognition · Computer Science 2020-03-31 Tianlong Chen , Sijia Liu , Shiyu Chang , Yu Cheng , Lisa Amini , Zhangyang Wang

As powerful Large Language Models (LLMs) are now widely used for numerous practical applications, their safety is of critical importance. While alignment techniques have significantly improved overall safety, LLMs remain vulnerable to…

Machine Learning · Computer Science 2024-10-28 Samuel Jacob Chacko , Sajib Biswas , Chashi Mahiul Islam , Fatema Tabassum Liza , Xiuwen Liu

We propose a simple and general method to regularize the fine-tuning of Transformer-based encoders for text classification tasks. Specifically, during fine-tuning we generate adversarial examples by perturbing the word embeddings of the…

Computation and Language · Computer Science 2022-02-21 Lin Pan , Chung-Wei Hang , Avirup Sil , Saloni Potdar

Deep neural networks are susceptible to adversarial attacks and common corruptions, which undermine their robustness. In order to enhance model resilience against such challenges, Adversarial Training (AT) has emerged as a prominent…

Machine Learning · Computer Science 2025-06-17 Tejaswini Medi , Steffen Jung , Margret Keuper

Large language models (LLMs) are vulnerable to adversarial attacks that can elicit harmful responses. Defending against such attacks remains challenging due to the opacity of jailbreaking mechanisms and the high computational cost of…

Machine Learning · Computer Science 2025-03-21 Lei Yu , Virginie Do , Karen Hambardzumyan , Nicola Cancedda

Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such…

Computer Vision and Pattern Recognition · Computer Science 2019-08-14 Chaithanya Kumar Mummadi , Thomas Brox , Jan Hendrik Metzen

Adversarial training (AT) is an effective defense for large language models (LLMs) against jailbreak attacks, but performing AT on LLMs is costly. To improve the efficiency of AT for LLMs, recent studies propose continuous AT (CAT) that…

Machine Learning · Computer Science 2026-04-15 Shaopeng Fu , Di Wang

Pre-trained language models (PLMs) have been widely used to underpin various downstream tasks. However, the adversarial attack task has found that PLMs are vulnerable to small perturbations. Mainstream methods adopt a detached two-stage…

Computation and Language · Computer Science 2023-05-30 Xuanjie Fang , Sijie Cheng , Yang Liu , Wei Wang

Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are…

Machine Learning · Computer Science 2026-03-05 Yicheng Bao , Xuhong Wang , Qiaosheng Zhang , Chaochao Lu , Xia Hu , Xin Tan

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific…

Computation and Language · Computer Science 2021-06-03 Karen Hambardzumyan , Hrant Khachatrian , Jonathan May

Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. Adversarial Training (AT) is a well-established technique to enhance adversarial robustness, but it often comes at…

Machine Learning · Computer Science 2023-08-08 Kaijie Zhu , Jindong Wang , Xixu Hu , Xing Xie , Ge Yang

Deep neural networks are highly vulnerable to adversarial examples, i.e.,small perturbations that can significantly degrade model performance. While adversarial training has become the primary defense strategy, most studies focus on…

Machine Learning · Computer Science 2026-05-14 Lilin Zhang , Yimo Guo , Yue Li , Jiancheng Shi , Xianggen Liu

As deep learning (DL) models are increasingly being integrated into our everyday lives, ensuring their safety by making them robust against adversarial attacks has become increasingly critical. DL models have been found to be susceptible to…

Machine Learning · Computer Science 2026-05-29 Hallgrimur Thorsteinsson , Valdemar J Henriksen , Daniel I R Cruz , Raghavendra Selvan , Tong Chen

Large vision models have been found vulnerable to adversarial examples, emphasizing the need for enhancing their adversarial robustness. While adversarial training is an effective defense for deep convolutional models, it often faces…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Kangtao Lv , Huangsen Cao , Kainan Tu , Yihuai Xu , Zhimeng Zhang , Xin Ding , Yongwei Wang

Beyond the success story of adversarial training (AT) in the recent text domain on top of pre-trained language models (PLMs), our empirical study showcases the inconsistent gains from AT on some tasks, e.g. commonsense reasoning, named…

Computation and Language · Computer Science 2023-05-09 Hongqiu Wu , Yongxiang Liu , Hanwen Shi , Hai Zhao , Min Zhang

Adversarial examples are carefully perturbed in-puts for fooling machine learning models. A well-acknowledged defense method against such examples is adversarial training, where adversarial examples are injected into training data to…

Machine Learning · Computer Science 2019-05-17 Bai Li , Changyou Chen , Wenlin Wang , Lawrence Carin

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different…

Computation and Language · Computer Science 2019-08-16 Yaru Hao , Li Dong , Furu Wei , Ke Xu

Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant progress, models remain vulnerable to simple in-distribution exploits, such as rewriting…

Machine Learning · Computer Science 2026-02-19 Chengzhi Hu , Jonas Dornbusch , David Lüdke , Stephan Günnemann , Leo Schwinn

Deep neural networks are capable of training fast and generalizing well within many domains. Despite their promising performance, deep networks have shown sensitivities to perturbations of their inputs (e.g., adversarial examples) and their…

Machine Learning · Computer Science 2020-07-09 Justin Goodwin , Olivia Brown , Victoria Helus