Related papers: State Regularized Policy Optimization on Data with…

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Decision-making under distribution shift is a central challenge in reinforcement learning (RL), where training and deployment environments differ. We study this problem through the lens of robust Markov decision processes (RMDPs), which…

Machine Learning · Computer Science 2025-10-17 Jingwen Gu , Yiting He , Zhishuai Liu , Pan Xu

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or…

Machine Learning · Computer Science 2024-05-30 Yu Luo , Tianying Ji , Fuchun Sun , Jianwei Zhang , Huazhe Xu , Xianyuan Zhan

Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the…

Machine Learning · Computer Science 2024-10-29 Sheryl Paul , Jyotirmoy V. Deshmukh

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing…

Machine Learning · Computer Science 2021-12-21 Yufei Kuang , Miao Lu , Jie Wang , Qi Zhou , Bin Li , Houqiang Li

Robustness and risk management via distributional dynamic programming

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally…

Machine Learning · Computer Science 2022-01-03 Mastane Achab , Gergely Neu

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process. A key challenge of offline RL is…

Machine Learning · Computer Science 2022-06-16 Shentao Yang , Yihao Feng , Shujian Zhang , Mingyuan Zhou

Federated Offline Policy Optimization with Dual Regularization

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the…

Machine Learning · Computer Science 2024-05-30 Sheng Yue , Zerui Qin , Xingyuan Hua , Yongheng Deng , Ju Ren

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes…

Machine Learning · Computer Science 2021-05-21 Lucas N. Alegre , Ana L. C. Bazzan , Bruno C. da Silva

State-wise Constrained Policy Optimization

Reinforcement Learning (RL) algorithms have shown tremendous success in simulation environments, but their application to real-world problems faces significant challenges, with safety being a major concern. In particular, enforcing…

Machine Learning · Computer Science 2024-06-19 Weiye Zhao , Rui Chen , Yifan Sun , Tianhao Wei , Changliu Liu

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising…

Machine Learning · Computer Science 2019-03-29 Quan Vuong , Sharad Vikram , Hao Su , Sicun Gao , Henrik I. Christensen

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

Towards an Understanding of Default Policies in Multitask Policy Optimization

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize…

Machine Learning · Computer Science 2022-03-24 Ted Moskovitz , Michael Arbel , Jack Parker-Holder , Aldo Pacchiano

Constrained Reinforcement Learning Under Model Mismatch

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

Deep Reinforcement Learning with Robust and Smooth Policy

Deep reinforcement learning (RL) has achieved great empirical successes in various domains. However, the large search space of neural networks requires a large amount of data, which makes the current RL algorithms not sample efficient.…

Machine Learning · Computer Science 2020-08-18 Qianli Shen , Yan Li , Haoming Jiang , Zhaoran Wang , Tuo Zhao

Model-Based Offline Meta-Reinforcement Learning with Regularization

Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these…

Machine Learning · Computer Science 2022-07-14 Sen Lin , Jialin Wan , Tengyu Xu , Yingbin Liang , Junshan Zhang

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum…

Machine Learning · Computer Science 2020-03-31 Sungryull Sohn , Yinlam Chow , Jayden Ooi , Ofir Nachum , Honglak Lee , Ed Chi , Craig Boutilier

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning…

Machine Learning · Computer Science 2025-10-10 Yuanjun Dai , Keqiang He , An Wang

Learn Dynamic-Aware State Embedding for Transfer Learning

Transfer reinforcement learning aims to improve the sample efficiency of solving unseen new tasks by leveraging experiences obtained from previous tasks. We consider the setting where all tasks (MDPs) share the same environment dynamic…

Machine Learning · Computer Science 2021-01-08 Kaige Yang

Robust Transfer Learning with Side Information

Robust Markov Decision Processes (MDPs) address environmental shift through distributionally robust optimization (DRO) by finding an optimal worst-case policy within an uncertainty set of transition kernels. However, standard DRO approaches…

Machine Learning · Statistics 2026-03-10 Akram S. Awad , Shihab Ahmed , Yue Wang , George K. Atia