Related papers: Corruption-Robust Offline Reinforcement Learning w…

Corruption Robust Offline Reinforcement Learning with Human Feedback

We study data corruption robustness for reinforcement learning with human feedback (RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with feedback about human preferences, an $\varepsilon$-fraction of the…

Machine Learning · Computer Science 2024-02-13 Debmalya Mandal , Andi Nika , Parameswaran Kamalaruban , Adish Singla , Goran Radanović

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Despite the significant interest and progress in reinforcement learning (RL) problems with adversarial corruption, current works are either confined to the linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound,…

Machine Learning · Statistics 2024-02-13 Chenlu Ye , Wei Xiong , Quanquan Gu , Tong Zhang

Corruption-Robust Offline Reinforcement Learning

We study the adversarial robustness in offline reinforcement learning. Given a batch dataset consisting of tuples $(s, a, r, s')$, an adversary is allowed to arbitrarily modify $\epsilon$ fraction of the tuples. From the corrupted dataset…

Machine Learning · Computer Science 2021-06-15 Xuezhou Zhang , Yiding Chen , Jerry Zhu , Wen Sun

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of…

Machine Learning · Statistics 2024-07-23 Chenlu Ye , Jiafan He , Quanquan Gu , Tong Zhang

Sparse Offline Reinforcement Learning with Corruption Robustness

We investigate robustness to strong data corruption in offline sparse reinforcement learning (RL). In our setting, an adversary may arbitrarily perturb a fraction of the collected trajectories from a high-dimensional but sparse Markov…

Machine Learning · Statistics 2026-05-13 Nam Phuong Tran , Andi Nika , Goran Radanovic , Long Tran-Thanh , Debmalya Mandal

A Model Selection Approach for Corruption Robust Reinforcement Learning

We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior knowledge on the total amount of corruption, our algorithm…

Machine Learning · Computer Science 2024-12-31 Chen-Yu Wei , Christoph Dann , Julian Zimmert

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

We consider robustness against data corruption in offline multi-agent reinforcement learning from human feedback (MARLHF) under a strong-contamination model: given a dataset $D$ of trajectory-preference tuples (each preference being an…

Machine Learning · Computer Science 2026-04-10 Andi Nika , Debmalya Mandal , Parameswaran Kamalaruban , Adish Singla , Goran Radanović

Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization

Offline reinforcement learning (RL) is vulnerable to real-world data corruption, with even robust algorithms failing under challenging observation and mixture corruptions. We posit this failure stems from data corruption creating sharp…

Machine Learning · Computer Science 2026-04-08 Le Xu , Jiayu Chen

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in…

Machine Learning · Computer Science 2024-03-12 Rui Yang , Han Zhong , Jiawei Xu , Amy Zhang , Chongjie Zhang , Lei Han , Tong Zhang

Certified Robust Neural Networks: Generalization and Corruption Resistance

Recent work have demonstrated that robustness (to "corruption") can be at odds with generalization. Adversarial training, for instance, aims to reduce the problematic susceptibility of modern neural networks to small data perturbations.…

Machine Learning · Statistics 2023-05-19 Amine Bennouna , Ryan Lucas , Bart Van Parys

Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling

Learning policy from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making while avoiding unsafe and costly online interactions. However, real-world data collected from sensors or…

Machine Learning · Computer Science 2025-03-04 Jiawei Xu , Rui Yang , Shuang Qiu , Feng Luo , Meng Fang , Baoxiang Wang , Lei Han

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system. We propose new algorithms which, compared to the existing results in (Lykouris et…

Machine Learning · Computer Science 2021-03-09 Yifang Chen , Simon S. Du , Kevin Jamieson

Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity

This paper concerns the central issues of model robustness and sample efficiency in offline reinforcement learning (RL), which aims to learn to perform decision making from history data without active exploration. Due to uncertainties and…

Machine Learning · Computer Science 2024-01-01 Laixi Shi , Yuejie Chi

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL. However, existing works heavily rely on the purity of the data,…

Machine Learning · Computer Science 2022-10-20 Chengqian Gao , Ke Xu , Liu Liu , Deheng Ye , Peilin Zhao , Zhiqiang Xu

Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification

In linear bandits, how can a learner effectively learn when facing corrupted rewards? While significant work has explored this question, a holistic understanding across different adversarial models and corruption measures is lacking, as is…

Machine Learning · Computer Science 2024-10-21 Haolin Liu , Artin Tajdini , Andrew Wagenmaker , Chen-Yu Wei

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we…

Machine Learning · Statistics 2021-09-24 Shinji Ito

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

This paper is concerned with offline reinforcement learning (RL), which learns using pre-collected data without further exploration. Effective offline RL would be able to accommodate distribution shift and limited data coverage. However,…

Machine Learning · Statistics 2024-03-11 Gen Li , Laixi Shi , Yuxin Chen , Yuejie Chi , Yuting Wei

Corruption-robust exploration in episodic reinforcement learning

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic…

Machine Learning · Computer Science 2023-11-02 Thodoris Lykouris , Max Simchowitz , Aleksandrs Slivkins , Wen Sun

Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates

We study the problem of learning the optimal policy in a discounted, infinite-horizon reinforcement learning (RL) setting in the presence of adversarially corrupted rewards. To address this problem, we develop a novel robust variant of the…

Machine Learning · Computer Science 2026-05-22 Sreejeet Maity , Aritra Mitra

Robust Policy Gradient against Strong Data Corruption

We study the problem of robust reinforcement learning under adversarial corruption on both rewards and transitions. Our attack model assumes an \textit{adaptive} adversary who can arbitrarily corrupt the reward and transition at every step…

Machine Learning · Computer Science 2021-06-09 Xuezhou Zhang , Yiding Chen , Xiaojin Zhu , Wen Sun