Related papers: Proximal Deterministic Policy Gradient

Offline Primal-Dual Reinforcement Learning for Linear MDPs

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy. This problem has attracted a lot of attention recently, but most existing methods with strong…

Machine Learning · Computer Science 2023-05-23 Germano Gabbianelli , Gergely Neu , Nneka Okolo , Matteo Papini

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing…

Machine Learning · Computer Science 2021-12-06 Scott Fujimoto , Shixiang Shane Gu

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

Relative Entropy Regularized Policy Iteration

We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of…

Machine Learning · Computer Science 2018-12-07 Abbas Abdolmaleki , Jost Tobias Springenberg , Jonas Degrave , Steven Bohez , Yuval Tassa , Dan Belov , Nicolas Heess , Martin Riedmiller

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as…

Machine Learning · Computer Science 2020-06-30 Rasool Fakoor , Pratik Chaudhari , Alexander J. Smola

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they…

Machine Learning · Computer Science 2025-03-18 Natinael Solomon Neggatu , Jeremie Houssineau , Giovanni Montana

Behavior Proximal Policy Optimization

Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs. Thus, various additional augmentations are…

Machine Learning · Computer Science 2023-02-23 Zifeng Zhuang , Kun Lei , Jinxin Liu , Donglin Wang , Yilang Guo

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Bi-Level Offline Policy Optimization with Limited Exploration

We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration,…

Machine Learning · Computer Science 2023-10-11 Wenzhuo Zhou

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL…

Machine Learning · Computer Science 2016-05-27 Nan Jiang , Lihong Li

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In…

Machine Learning · Computer Science 2019-12-16 Aurélien F. Bibaut , Ivana Malenica , Nikos Vlassis , Mark J. van der Laan

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data…

Machine Learning · Computer Science 2022-06-16 Raghuram Bharadwaj Diddigi , Prateek Jain , Prabuchandran K. J. , Shalabh Bhatnagar

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while…

Machine Learning · Computer Science 2023-07-25 Jiachen Li , Edwin Zhang , Ming Yin , Qinxun Bai , Yu-Xiang Wang , William Yang Wang

Offline RL Without Off-Policy Evaluation

Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using…

Machine Learning · Computer Science 2021-12-06 David Brandfonbrener , William F. Whitney , Rajesh Ranganath , Joan Bruna

Off-Policy Primal-Dual Safe Reinforcement Learning

Primal-dual safe RL methods commonly perform iterations between the primal update of the policy and the dual update of the Lagrange Multiplier. Such a training paradigm is highly susceptible to the error in cumulative cost estimation since…

Machine Learning · Computer Science 2024-04-16 Zifan Wu , Bo Tang , Qian Lin , Chao Yu , Shangqin Mao , Qianlong Xie , Xingxing Wang , Dong Wang

Stochastic Primal-Dual Q-Learning

In this work, we present a new model-free and off-policy reinforcement learning (RL) algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the…

Optimization and Control · Mathematics 2025-07-21 Narim Jeong , Donghwan Lee , Niao He

Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

In recent years, reinforcement learning (RL) has gained increasing attention in control engineering. Especially, policy gradient methods are widely used. In this work, we improve the tracking performance of proximal policy optimization…

Machine Learning · Computer Science 2021-07-21 Jana Mayer , Johannes Westermann , Juan Pedro Gutiérrez H. Muriedas , Uwe Mettin , Alexander Lampe

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience,…

Machine Learning · Computer Science 2019-03-21 Kate Rakelly , Aurick Zhou , Deirdre Quillen , Chelsea Finn , Sergey Levine

Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions

Offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. But how should we…

Machine Learning · Computer Science 2023-03-31 Yicheng Luo , Jackie Kay , Edward Grefenstette , Marc Peter Deisenroth