Related papers: Adversarial Training for Large Neural Language Mod…

Adversarial and Natural Perturbations for General Robustness

In this paper we aim to explore the general robustness of neural network classifiers by utilizing adversarial as well as natural perturbations. Different from previous works which mainly focus on studying the robustness of neural networks…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Sadaf Gulshad , Jan Hendrik Metzen , Arnold Smeulders

Improving adversarial robustness of deep neural networks by using semantic information

The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original…

Machine Learning · Computer Science 2021-06-18 Lina Wang , Rui Tang , Yawei Yue , Xingshu Chen , Wei Wang , Yi Zhu , Xuemei Zeng

Addressing Neural Network Robustness with Mixup and Targeted Labeling Adversarial Training

Despite their performance, Artificial Neural Networks are not reliable enough for most of industrial applications. They are sensitive to noises, rotations, blurs and adversarial examples. There is a need to build defenses that protect…

Machine Learning · Computer Science 2020-08-20 Alfred Laugros , Alice Caplier , Matthieu Ospici

Adversarial Training is Not Ready for Robot Learning

Adversarial training is an effective method to train deep learning models that are resilient to norm-bounded perturbations, with the cost of nominal performance drop. While adversarial training appears to enhance the robustness and safety…

Machine Learning · Computer Science 2021-03-16 Mathias Lechner , Ramin Hasani , Radu Grosu , Daniela Rus , Thomas A. Henzinger

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language tasks, but their safety and morality remain contentious due to their training on internet text corpora. To address these concerns, alignment…

Computation and Language · Computer Science 2024-08-06 Mohammad Bahrami Karkevandi , Nishant Vishwamitra , Peyman Najafirad

QAGAN: Adversarial Approach To Learning Domain Invariant Language Features

Training models that are robust to data domain shift has gained an increasing interest both in academia and industry. Question-Answering language models, being one of the typical problem in Natural Language Processing (NLP) research, has…

Computation and Language · Computer Science 2022-06-27 Shubham Shrivastava , Kaiyue Wang

Attention Meets Perturbations: Robust and Interpretable Attention with Adversarial Training

Although attention mechanisms have been applied to a variety of deep learning models and have been shown to improve the prediction performance, it has been reported to be vulnerable to perturbations to the mechanism. To overcome the…

Computation and Language · Computer Science 2022-11-23 Shunsuke Kitada , Hitoshi Iyatomi

Generalization Bounds for Adversarial Contrastive Learning

Deep networks are well-known to be fragile to adversarial attacks, and adversarial training is one of the most popular methods used to train a robust model. To take advantage of unlabeled data, recent works have applied adversarial training…

Machine Learning · Computer Science 2023-02-22 Xin Zou , Weiwei Liu

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This…

Machine Learning · Computer Science 2024-03-27 Xiangyu Yin , Wenjie Ruan

Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics

Adversarial training is a promising strategy for enhancing model robustness against adversarial attacks. However, its impact on generalization under substantial data distribution shifts in audio classification remains largely unexplored. To…

Machine Learning · Computer Science 2025-07-21 René Heinrich , Lukas Rauch , Bernhard Sick , Christoph Scholz

Adversarial Training for Multilingual Acoustic Modeling

Multilingual training has been shown to improve acoustic modeling performance by sharing and transferring knowledge in modeling different languages. Knowledge sharing is usually achieved by using common lower-level layers for different…

Computation and Language · Computer Science 2019-06-18 Ke Hu , Hasim Sak , Hank Liao

A Study on FGSM Adversarial Training for Neural Retrieval

Neural retrieval models have acquired significant effectiveness gains over the last few years compared to term-based methods. Nevertheless, those models may be brittle when faced to typos, distribution shifts or vulnerable to malicious…

Information Retrieval · Computer Science 2023-01-26 Simon Lupart , Stéphane Clinchant

Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks

Adversarial training has been proven to be an effective technique for improving the adversarial robustness of models. However, there seems to be an inherent trade-off between optimizing the model for accuracy and robustness. To this end, we…

Computer Vision and Pattern Recognition · Computer Science 2020-08-20 Elahe Arani , Fahad Sarfraz , Bahram Zonooz

Calibrated Adversarial Training

Adversarial training is an approach of increasing the robustness of models to adversarial attacks by including adversarial examples in the training set. One major challenge of producing adversarial examples is to contain sufficient…

Machine Learning · Computer Science 2021-10-13 Tianjin Huang , Vlado Menkovski , Yulong Pei , Mykola Pechenizkiy

Hard Adversarial Example Mining for Improving Robust Fairness

Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE). Nevertheless, recent studies have revealed that adversarially…

Machine Learning · Computer Science 2023-08-04 Chenhao Lin , Xiang Ji , Yulong Yang , Qian Li , Chao Shen , Run Wang , Liming Fang

Towards Deep Learning Models Resistant to Adversarial Attacks

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings…

Machine Learning · Statistics 2019-09-06 Aleksander Madry , Aleksandar Makelov , Ludwig Schmidt , Dimitris Tsipras , Adrian Vladu

Adversarial Contrastive Pre-training for Protein Sequences

Recent developments in Natural Language Processing (NLP) demonstrate that large-scale, self-supervised pre-training can be extremely beneficial for downstream tasks. These ideas have been adapted to other domains, including the analysis of…

Computation and Language · Computer Science 2021-02-02 Matthew B. A. McDermott , Brendan Yap , Harry Hsu , Di Jin , Peter Szolovits

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the…

Computation and Language · Computer Science 2023-10-18 Erfan Shayegani , Md Abdullah Al Mamun , Yu Fu , Pedram Zaree , Yue Dong , Nael Abu-Ghazaleh

Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness

We introduce a Noise-based prior Learning (NoL) approach for training neural networks that are intrinsically robust to adversarial attacks. We find that the implicit generative modeling of random noise with the same loss function used…

Machine Learning · Computer Science 2019-06-04 Priyadarshini Panda , Kaushik Roy

Information Theoretic Adversarial Training of Large Language Models

Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing…

Machine Learning · Computer Science 2026-05-08 Yiwei Zhang , Jeremiah Birrell , Reza Ebrahimi , Rouzbeh Behnia , Jason Pacheco , Elisa Bertino