Related papers: Reducing Exploitability with Population Based Trai…

Robust Reinforcement Learning using Adversarial Populations

Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed. The Robust RL formulation tackles this by adding…

Machine Learning · Computer Science 2020-09-24 Eugene Vinitsky , Yuqing Du , Kanaad Parvate , Kathy Jang , Pieter Abbeel , Alexandre Bayen

Towards Understanding Fast Adversarial Training

Current neural-network-based classifiers are susceptible to adversarial examples. The most empirically successful approach to defending against such adversarial examples is adversarial training, which incorporates a strong self-attack…

Machine Learning · Computer Science 2020-06-08 Bai Li , Shiqi Wang , Suman Jana , Lawrence Carin

Adversarial Policies: Attacking Deep Reinforcement Learning

Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another…

Machine Learning · Computer Science 2021-01-19 Adam Gleave , Michael Dennis , Cody Wild , Neel Kant , Sergey Levine , Stuart Russell

Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state…

Machine Learning · Computer Science 2026-02-18 Shojiro Yamabe , Kazuto Fukuchi , Jun Sakuma

Robust Reinforcement Learning through Efficient Adversarial Herding

Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to…

Machine Learning · Computer Science 2023-06-14 Juncheng Dong , Hao-Lun Hsu , Qitong Gao , Vahid Tarokh , Miroslav Pajic

Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples

Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we…

Machine Learning · Statistics 2023-06-02 Dongyoon Yang , Insung Kong , Yongdai Kim

Self-supervised Adversarial Training

Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try to harden the model in various ways, in which adversarial training is an effective way which learns…

Machine Learning · Computer Science 2020-02-04 Kejiang Chen , Hang Zhou , Yuefeng Chen , Xiaofeng Mao , Yuhong Li , Yuan He , Hui Xue , Weiming Zhang , Nenghai Yu

Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can…

Neural and Evolutionary Computing · Computer Science 2018-11-26 Edward Grefenstette , Robert Stanforth , Brendan O'Donoghue , Jonathan Uesato , Grzegorz Swirszcz , Pushmeet Kohli

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL

Most existing works focus on direct perturbations to the victim's state/action or the underlying transition dynamics to demonstrate the vulnerability of reinforcement learning agents to adversarial attacks. However, such direct…

Machine Learning · Computer Science 2024-02-21 Xiangyu Liu , Souradip Chakraborty , Yanchao Sun , Furong Huang

Robustness, Privacy, and Generalization of Adversarial Training

Adversarial training can considerably robustify deep neural networks to resist adversarial attacks. However, some works suggested that adversarial training might comprise the privacy-preserving and generalization abilities. This paper…

Machine Learning · Computer Science 2020-12-29 Fengxiang He , Shaopeng Fu , Bohan Wang , Dacheng Tao

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Adversarial training aims to defend against adversaries: malicious opponents whose sole aim is to harm predictive performance in any way possible. This presents a rather harsh perspective, which we assert results in unnecessarily…

Machine Learning · Computer Science 2025-06-10 Maayan Ehrenberg , Roy Ganz , Nir Rosenfeld

Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Recently, robust reinforcement learning (RL) methods against input observation have garnered significant attention and undergone rapid evolution due to RL's potential vulnerability. Although these advanced methods have achieved reasonable…

Machine Learning · Computer Science 2024-09-04 Kosuke Nakanishi , Akihiro Kubo , Yuji Yasui , Shin Ishii

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data.…

Machine Learning · Computer Science 2024-03-20 Alexander Robey , Fabian Latorre , George J. Pappas , Hamed Hassani , Volkan Cevher

Multitask Learning Strengthens Adversarial Robustness

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network. We present both theoretical and empirical…

Computer Vision and Pattern Recognition · Computer Science 2020-09-14 Chengzhi Mao , Amogh Gupta , Vikram Nitin , Baishakhi Ray , Shuran Song , Junfeng Yang , Carl Vondrick

Adversarial Training Can Hurt Generalization

While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary). Previous work has studied this tradeoff between standard and robust accuracy, but only in the…

Machine Learning · Computer Science 2019-08-28 Aditi Raghunathan , Sang Michael Xie , Fanny Yang , John C. Duchi , Percy Liang

Adversarial Training and Robustness for Multiple Perturbations

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small $\ell_\infty$-noise). For other perturbations, these defenses offer no guarantees and, at times, even…

Machine Learning · Computer Science 2019-10-21 Florian Tramèr , Dan Boneh

Recent Advances in Adversarial Training for Adversarial Robustness

Adversarial training is one of the most effective approaches defending against adversarial examples for deep learning models. Unlike other defense strategies, adversarial training aims to promote the robustness of models intrinsically.…

Machine Learning · Computer Science 2021-04-22 Tao Bai , Jinqi Luo , Jun Zhao , Bihan Wen , Qian Wang

Learning Diverse Risk Preferences in Population-based Self-play

Among the great successes of Reinforcement Learning (RL), self-play algorithms play an essential role in solving competitive games. Current self-play algorithms optimize the agent to maximize expected win-rates against its current or…

Machine Learning · Computer Science 2023-12-18 Yuhua Jiang , Qihan Liu , Xiaoteng Ma , Chenghao Li , Yiqin Yang , Jun Yang , Bin Liang , Qianchuan Zhao

Bridging Models to Defend: A Population-Based Strategy for Robust Adversarial Defense

Adversarial robustness is a critical measure of a neural network's ability to withstand adversarial attacks at inference time. While robust training techniques have improved defenses against individual $\ell_p$-norm attacks (e.g., $\ell_2$…

Artificial Intelligence · Computer Science 2025-08-26 Ren Wang , Yuxuan Li , Can Chen , Dakuo Wang , Jinjun Xiong , Pin-Yu Chen , Sijia Liu , Mohammad Shahidehpour , Alfred Hero

Can Adversarial Training Be Manipulated By Non-Robust Features?

Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel…

Machine Learning · Computer Science 2022-10-11 Lue Tao , Lei Feng , Hongxin Wei , Jinfeng Yi , Sheng-Jun Huang , Songcan Chen