English
Related papers

Related papers: Policy Regularized Distributionally Robust Markov …

200 papers

The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Existing methods mostly use unstructured…

Machine Learning · Computer Science 2025-11-03 Cheng Tang , Zhishuai Liu , Pan Xu

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally…

Machine Learning · Computer Science 2022-01-03 Mastane Achab , Gergely Neu

We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We aim to solve this problem via online distributionally robust Markov decision processes (DRMDPs),…

Machine Learning · Computer Science 2024-02-26 Zhishuai Liu , Pan Xu

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision…

Machine Learning · Computer Science 2020-10-13 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar

We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a…

Statistics Theory · Mathematics 2021-11-11 Zhengling Qi , Peng Liao

Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with…

Machine Learning · Computer Science 2022-01-03 Tien Mai , Patrick Jaillet

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is…

Machine Learning · Computer Science 2022-05-17 Kishan Panaganti , Dileep Kalathil

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight…

Machine Learning · Computer Science 2022-09-29 Jing Dong , Jingwei Li , Baoxiang Wang , Jingzhao Zhang

In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context…

Machine Learning · Computer Science 2024-02-23 Zhenghai Xue , Qingpeng Cai , Shuchang Liu , Dong Zheng , Peng Jiang , Kun Gai , Bo An

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

This paper concerns the central issues of model robustness and sample efficiency in offline reinforcement learning (RL), which aims to learn to perform decision making from history data without active exploration. Due to uncertainties and…

Machine Learning · Computer Science 2024-01-01 Laixi Shi , Yuejie Chi

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of…

Machine Learning · Computer Science 2022-12-16 Md Masudur Rahman , Yexiang Xue

Policy-based Reinforcement Learning (RL) has established itself as the dominant paradigm in generative recommendation for optimizing sequential user interactions. However, when applied to offline historical logs, these methods suffer a…

Machine Learning · Computer Science 2026-02-12 Jie Jiang , Yusen Huo , Xiangxin Zhan , Changping Wang , Jun Zhang

The sim-to-real gap, which represents the disparity between training and testing environments, poses a significant challenge in reinforcement learning (RL). A promising approach to addressing this challenge is distributionally robust RL,…

Machine Learning · Computer Science 2024-11-05 Miao Lu , Han Zhong , Tong Zhang , Jose Blanchet

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), as the widely employed policy based reinforcement learning (RL) methods, are prone to converge to a sub-optimal solution as they limit the policy representation…

Machine Learning · Computer Science 2020-06-16 Jun Song , Chaoyue Zhao

This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties…

Machine Learning · Computer Science 2021-02-15 Kishan Panaganti , Dileep Kalathil

We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL). The proposed distributional dynamic programming algorithms…

Machine Learning · Statistics 2024-07-22 Julian Gerstenberg , Ralph Neininger , Denis Spiegel

In offline reinforcement learning (RL), the absence of active exploration calls for attention on the model robustness to tackle the sim-to-real gap, where the discrepancy between the simulated and deployed environments can significantly…

Machine Learning · Computer Science 2024-06-28 He Wang , Laixi Shi , Yuejie Chi

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical…

Optimization and Control · Mathematics 2020-07-15 Esther Derman , Shie Mannor

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent…

Machine Learning · Computer Science 2022-03-16 Jialian Li , Tongzheng Ren , Dong Yan , Hang Su , Jun Zhu
‹ Prev 1 2 3 10 Next ›