Related papers: Adversarial Training for Large Neural Language Mod…

Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training

Despite the rapid progress of neural networks, they remain highly vulnerable to adversarial examples, for which adversarial training (AT) is currently the most effective defense. While AT has been extensively studied, its practical…

Machine Learning · Computer Science 2025-10-16 Yisen Wang , Yichuan Mo , Hongjun Wang , Junyi Li , Zhouchen Lin

Lagrangian Objective Function Leads to Improved Unforeseen Attack Generalization in Adversarial Training

Recent improvements in deep learning models and their practical applications have raised concerns about the robustness of these models against adversarial examples. Adversarial training (AT) has been shown effective to reach a robust model…

Machine Learning · Computer Science 2021-03-30 Mohammad Azizmalayeri , Mohammad Hossein Rohban

RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training

Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives…

Computation and Language · Computer Science 2023-12-08 Jaehyung Kim , Yuning Mao , Rui Hou , Hanchao Yu , Davis Liang , Pascale Fung , Qifan Wang , Fuli Feng , Lifu Huang , Madian Khabsa

Adversarial Preference Learning for Robust LLM Alignment

Modern language models often rely on Reinforcement Learning from Human Feedback (RLHF) to encourage safe behaviors. However, they remain vulnerable to adversarial attacks due to three key limitations: (1) the inefficiency and high cost of…

Machine Learning · Computer Science 2025-06-02 Yuanfu Wang , Pengyu Wang , Chenyang Xi , Bo Tang , Junyi Zhu , Wenqiang Wei , Chen Chen , Chao Yang , Jingfeng Zhang , Chaochao Lu , Yijun Niu , Keming Mao , Zhiyu Li , Feiyu Xiong , Jie Hu , Mingchuan Yang

AT-BERT: Adversarial Training BERT for Acronym Identification Winning Solution for SDU@AAAI-21

Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated, which is crucial for scientific document understanding tasks. However, the limited size of manually annotated datasets hinders further…

Computation and Language · Computer Science 2021-01-13 Danqing Zhu , Wangli Lin , Yang Zhang , Qiwei Zhong , Guanxiong Zeng , Weilin Wu , Jiayu Tang

Domain Adversarial Fine-Tuning as an Effective Regularizer

In Natural Language Processing (NLP), pretrained language models (LMs) that are transferred to downstream tasks have been recently shown to achieve state-of-the-art results. However, standard fine-tuning can degrade the general-domain…

Machine Learning · Computer Science 2020-10-07 Giorgos Vernikos , Katerina Margatina , Alexandra Chronopoulou , Ion Androutsopoulos

Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better

Adversarial training (AT) is an effective technique for enhancing adversarial robustness, but it usually comes at the cost of a decline in generalization ability. Recent studies have attempted to use clean training to assist adversarial…

Machine Learning · Computer Science 2025-04-02 MingWei Zhou , Xiaobing Pei

Convergence of Adversarial Training in Overparametrized Neural Networks

Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training, a heuristic form of robust optimization that…

Machine Learning · Computer Science 2019-11-12 Ruiqi Gao , Tianle Cai , Haochuan Li , Liwei Wang , Cho-Jui Hsieh , Jason D. Lee

Randomized Adversarial Training via Taylor Expansion

In recent years, there has been an explosion of research into developing more robust deep neural networks against adversarial examples. Adversarial training appears as one of the most successful methods. To deal with both the robustness…

Machine Learning · Computer Science 2023-03-21 Gaojie Jin , Xinping Yi , Dengyu Wu , Ronghui Mu , Xiaowei Huang

Regional Adversarial Training for Better Robust Generalization

Adversarial training (AT) has been demonstrated as one of the most promising defense methods against various adversarial attacks. To our knowledge, existing AT-based methods usually train with the locally most adversarial perturbed points…

Computer Vision and Pattern Recognition · Computer Science 2021-09-07 Chuanbiao Song , Yanbo Fan , Yichen Yang , Baoyuan Wu , Yiming Li , Zhifeng Li , Kun He

An Adversarially-Learned Turing Test for Dialog Generation Models

The design of better automated dialogue evaluation metrics offers the potential of accelerate evaluation research on conversational AI. However, existing trainable dialogue evaluation models are generally restricted to classifiers trained…

Computation and Language · Computer Science 2021-04-19 Xiang Gao , Yizhe Zhang , Michel Galley , Bill Dolan

Self-Generative Adversarial Fine-Tuning for Large Language Models

Fine-tuning large language models (LLMs) for alignment typically relies on supervised fine-tuning or reinforcement learning from human feedback, both limited by the cost and scarcity of high-quality annotations. Recent self-play and…

Machine Learning · Computer Science 2026-02-03 Shiguang Wu , Yaqing Wang , Quanming Yao

Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training

Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising…

Artificial Intelligence · Computer Science 2024-06-03 Feiteng Fang , Yuelin Bai , Shiwen Ni , Min Yang , Xiaojun Chen , Ruifeng Xu

A Survey of Robust Adversarial Training in Pattern Recognition: Fundamental, Theory, and Methodologies

In the last a few decades, deep neural networks have achieved remarkable success in machine learning, computer vision, and pattern recognition. Recent studies however show that neural networks (both shallow and deep) may be easily fooled by…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Zhuang Qian , Kaizhu Huang , Qiu-Feng Wang , Xu-Yao Zhang

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

While existing work in robust deep learning has focused on small pixel-level norm-based perturbations, this may not account for perturbations encountered in several real-world settings. In many such cases although test data might not be…

Computer Vision and Pattern Recognition · Computer Science 2021-04-09 Tejas Gokhale , Rushil Anirudh , Bhavya Kailkhura , Jayaraman J. Thiagarajan , Chitta Baral , Yezhou Yang

Adversarial Analysis of Natural Language Inference Systems

The release of large natural language inference (NLI) datasets like SNLI and MNLI have led to rapid development and improvement of completely neural systems for the task. Most recently, heavily pre-trained, Transformer-based models like…

Computation and Language · Computer Science 2019-12-10 Tiffany Chien , Jugal Kalita

AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization

Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data,…

Computation and Language · Computer Science 2025-09-23 Mukur Gupta , Nikhil Reddy Varimalla , Nicholas Deas , Melanie Subbiah , Kathleen McKeown

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

In recent years, large pre-trained Transformer-based language models have led to dramatic improvements in many natural language understanding tasks. To train these models with increasing sizes, many neural network practitioners attempt to…

Machine Learning · Computer Science 2022-02-01 Minjia Zhang , Niranjan Uma Naresh , Yuxiong He

Adversarial Training Methods for Semi-Supervised Text Classification

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making…

Machine Learning · Statistics 2021-11-17 Takeru Miyato , Andrew M. Dai , Ian Goodfellow

Adversarial NLI: A New Benchmark for Natural Language Understanding

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of…

Computation and Language · Computer Science 2020-05-07 Yixin Nie , Adina Williams , Emily Dinan , Mohit Bansal , Jason Weston , Douwe Kiela