Related papers: Distillation Policy Optimization

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we…

Machine Learning · Computer Science 2026-01-06 Alexander W. Goodall , Edwin Hamel-De le Court , Francesco Belardinelli

Generalized Proximal Policy Optimization with Sample Reuse

In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while…

Machine Learning · Computer Science 2021-11-02 James Queeney , Ioannis Ch. Paschalidis , Christos G. Cassandras

Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences

Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for…

Machine Learning · Computer Science 2024-02-13 Nikhil Kumar Singh , Indranil Saha

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement learning (RL). The algorithms designed for this problem often suffer from high variance in their estimators, which results in poor sample efficiency, and have issues…

Machine Learning · Computer Science 2020-09-15 Daoming Lyu , Qi Qi , Mohammad Ghavamzadeh , Hengshuai Yao , Tianbao Yang , Bo Liu

Stable and Efficient Policy Evaluation

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled:…

Machine Learning · Computer Science 2021-12-30 Daoming Lyu , Bo Liu , Matthieu Geist , Wen Dong , Saad Biaz , Qi Wang

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Improving the sample efficiency of reinforcement learning algorithms requires effective exploration. Following the principle of $\textit{optimism in the face of uncertainty}$ (OFU), we train a separate exploration policy to maximize the…

Machine Learning · Computer Science 2022-11-23 Jiachen Li , Shuo Cheng , Zhenyu Liao , Huayan Wang , William Yang Wang , Qinxun Bai

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability…

Machine Learning · Computer Science 2019-11-19 Simon Schmitt , Matteo Hessel , Karen Simonyan

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle…

Machine Learning · Computer Science 2022-10-12 Rujie Zhong , Duohan Zhang , Lukas Schäfer , Stefano V. Albrecht , Josiah P. Hanna

When Do Off-Policy and On-Policy Policy Gradient Methods Align?

Policy gradient methods are widely adopted reinforcement learning algorithms for tasks with continuous action spaces. These methods succeeded in many application domains, however, because of their notorious sample inefficiency their use…

Machine Learning · Statistics 2024-02-20 Davide Mambelli , Stephan Bongers , Onno Zoeter , Matthijs T. J. Spaan , Frans A. Oliehoek

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for…

Machine Learning · Computer Science 2018-10-16 Ilya Kostrikov , Kumar Krishna Agrawal , Debidatta Dwibedi , Sergey Levine , Jonathan Tompson

Improved off-policy training of diffusion samplers

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational…

Machine Learning · Computer Science 2025-01-15 Marcin Sendera , Minsu Kim , Sarthak Mittal , Pablo Lemos , Luca Scimeca , Jarrid Rector-Brooks , Alexandre Adam , Yoshua Bengio , Nikolay Malkin

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman…

Machine Learning · Computer Science 2021-10-07 Ting-Han Fan , Peter J. Ramadge

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

Machine Learning · Statistics 2016-07-19 S. A. Murphy , Y. Deng , E. B. Laber , H. R. Maei , R. S. Sutton , K. Witkiewitz

Data-Based Efficient Off-Policy Stabilizing Optimal Control Algorithms for Discrete-Time Linear Systems via Damping Coefficients

Policy iteration is one of the classical frameworks of reinforcement learning, which requires a known initial stabilizing control. However, finding the initial stabilizing control depends on the known system model. To relax this requirement…

Systems and Control · Electrical Eng. & Systems 2025-03-20 Dongdong Li , Jiuxiang Dong

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to…

Machine Learning · Computer Science 2017-06-02 Shixiang Gu , Timothy Lillicrap , Zoubin Ghahramani , Richard E. Turner , Bernhard Schölkopf , Sergey Levine