Related papers: Adversarial Training for Large Neural Language Mod…

A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning

Adversarial Training (AT) with Projected Gradient Descent (PGD) is an effective approach for improving the robustness of the deep neural networks. However, PGD AT has been shown to suffer from two main limitations: i) high computational…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Ahmadreza Jeddi , Mohammad Javad Shafiee , Alexander Wong

Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models

Neural networks are vulnerable to adversarial attacks -- small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to…

Machine Learning · Computer Science 2019-06-27 Mayank Singh , Abhishek Sinha , Nupur Kumari , Harshitha Machiraju , Balaji Krishnamurthy , Vineeth N Balasubramanian

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

While vision and multimodal foundation models underpin critical tasks from perception to complex reasoning, they remain highly vulnerable to adversarial attacks. However, traditional adversarial attacks are typically limited to single,…

Cryptography and Security · Computer Science 2026-05-20 Ye Sun , Xin Wang , Jiaming Zhang , Yifeng Gao , Yixu Wang , Yifan Ding , Qixian Zhang , Henghui Ding , Xingjun Ma , Yu-Gang Jiang

AccelAT: A Framework for Accelerating the Adversarial Training of Deep Neural Networks through Accuracy Gradient

Adversarial training is exploited to develop a robust Deep Neural Network (DNN) model against the malicious altered data. These attacks may have catastrophic effects on DNN models but are indistinguishable for a human being. For example, an…

Machine Learning · Computer Science 2022-10-14 Farzad Nikfam , Alberto Marchisio , Maurizio Martina , Muhammad Shafique

Adversarial Training: A Survey

Adversarial training (AT) refers to integrating adversarial examples -- inputs altered with imperceptible perturbations that can significantly impact model predictions -- into the training process. Recent studies have demonstrated the…

Machine Learning · Computer Science 2024-10-22 Mengnan Zhao , Lihe Zhang , Jingwen Ye , Huchuan Lu , Baocai Yin , Xinchao Wang

Adversarial training for multi-context joint entity and relation extraction

Adversarial training (AT) is a regularization method that can be used to improve the robustness of neural network methods by adding small perturbations in the training data. We show how to use AT for the tasks of entity recognition and…

Computation and Language · Computer Science 2019-01-15 Giannis Bekoulis , Johannes Deleu , Thomas Demeester , Chris Develder

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense…

Artificial Intelligence · Computer Science 2023-10-31 Leo Schwinn , David Dobre , Stephan Günnemann , Gauthier Gidel

Recent Advances in Adversarial Training for Adversarial Robustness

Adversarial training is one of the most effective approaches defending against adversarial examples for deep learning models. Unlike other defense strategies, adversarial training aims to promote the robustness of models intrinsically.…

Machine Learning · Computer Science 2021-04-22 Tao Bai , Jinqi Luo , Jun Zhao , Bihan Wen , Qian Wang

Adversarial Distributional Training for Robust Deep Learning

Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, most existing AT methods adopt a specific attack to craft adversarial examples,…

Machine Learning · Computer Science 2020-11-20 Yinpeng Dong , Zhijie Deng , Tianyu Pang , Hang Su , Jun Zhu

Targeted Adversarial Training for Natural Language Understanding

We present a simple yet effective Targeted Adversarial Training (TAT) algorithm to improve adversarial training for natural language understanding. The key idea is to introspect current mistakes and prioritize adversarial training steps to…

Computation and Language · Computer Science 2021-04-14 Lis Pereira , Xiaodong Liu , Hao Cheng , Hoifung Poon , Jianfeng Gao , Ichiro Kobayashi

Adversarial Fine-tune with Dynamically Regulated Adversary

Adversarial training is an effective method to boost model robustness to malicious, adversarial attacks. However, such improvement in model robustness often leads to a significant sacrifice of standard performance on clean images. In many…

Machine Learning · Computer Science 2022-04-29 Pengyue Hou , Ming Zhou , Jie Han , Petr Musilek , Xingyu Li

Efficient Adversarial Training in LLMs with Continuous Attacks

Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve robustness against such…

Machine Learning · Computer Science 2024-11-04 Sophie Xhonneux , Alessandro Sordoni , Stephan Günnemann , Gauthier Gidel , Leo Schwinn

CAT: Customized Adversarial Training for Improved Robustness

Adversarial training has become one of the most effective methods for improving robustness of neural networks. However, it often suffers from poor generalization on both clean and perturbed data. In this paper, we propose a new algorithm,…

Machine Learning · Computer Science 2020-02-19 Minhao Cheng , Qi Lei , Pin-Yu Chen , Inderjit Dhillon , Cho-Jui Hsieh

ASAT: Adaptively Scaled Adversarial Training in Time Series

Adversarial training is a method for enhancing neural networks to improve the robustness against adversarial examples. Besides the security concerns of potential adversarial examples, adversarial training can also improve the generalization…

Machine Learning · Computer Science 2022-12-21 Zhiyuan Zhang , Wei Li , Ruihan Bao , Keiko Harimoto , Yunfang Wu , Xu Sun

Scaling Adversarial Training to Large Perturbation Bounds

The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds,…

Machine Learning · Computer Science 2022-10-19 Sravanti Addepalli , Samyak Jain , Gaurang Sriramanan , R. Venkatesh Babu

Towards Alternative Techniques for Improving Adversarial Robustness: Analysis of Adversarial Training at a Spectrum of Perturbations

Adversarial training (AT) and its variants have spearheaded progress in improving neural network robustness to adversarial perturbations and common corruptions in the last few years. Algorithm design of AT and its variants are focused on…

Machine Learning · Computer Science 2022-06-15 Kaustubh Sridhar , Souradeep Dutta , Ramneet Kaur , James Weimer , Oleg Sokolsky , Insup Lee

Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal

Pre-trained language models (PLMs) have driven substantial progress in natural language processing but remain vulnerable to adversarial attacks, raising concerns about their robustness in real-world applications. Previous studies have…

Computation and Language · Computer Science 2025-10-17 Yang Wang , Chenghao Xiao , Yizhi Li , Stuart E. Middleton , Noura Al Moubayed , Chenghua Lin

Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge

Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In…

Machine Learning · Computer Science 2018-08-28 Pasquale Minervini , Sebastian Riedel

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under…

Machine Learning · Computer Science 2020-02-25 Yi Zhang , Orestis Plevrakis , Simon S. Du , Xingguo Li , Zhao Song , Sanjeev Arora

Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?

Large Language Model (LLM)-generated data is increasingly used in software analytics, but it is unclear how this data compares to human-written data, particularly when models are exposed to adversarial scenarios. Adversarial attacks can…

Software Engineering · Computer Science 2025-05-07 Md. Abdul Awal , Mrigank Rochan , Chanchal K. Roy