Related papers: Off-Policy Actor-Critic

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy…

Machine Learning · Computer Science 2019-11-20 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Zhaoran Wang , Tamer Basar , Ji Liu

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence…

Machine Learning · Computer Science 2019-06-21 Ehsan Imani , Eric Graves , Martha White

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and…

Machine Learning · Computer Science 2018-08-10 Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , Sergey Levine

Off-Policy Actor-Critic with Shared Experience Replay

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability…

Machine Learning · Computer Science 2019-11-19 Simon Schmitt , Matteo Hessel , Karen Simonyan

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the…

Machine Learning · Computer Science 2020-11-03 Wei Zhou , Yiying Li , Yongxin Yang , Huaimin Wang , Timothy M. Hospedales

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update…

Machine Learning · Computer Science 2019-03-25 Yan Zhang , Michael M. Zavlanos

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

Machine Learning · Statistics 2016-07-19 S. A. Murphy , Y. Deng , E. B. Laber , H. R. Maei , R. S. Sutton , K. Witkiewitz

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results…

Machine Learning · Computer Science 2023-06-05 Lingwei Peng , Hui Qian , Zhebang Shen , Chao Zhang , Fei Li

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Distillation Policy Optimization

While on-policy algorithms are known for their stability, they often demand a substantial number of samples. In contrast, off-policy algorithms, which leverage past experiences, are considered sample-efficient but tend to exhibit…

Machine Learning · Computer Science 2023-09-28 Jianfei Ma

The Actor-Advisor: Policy Gradient With Off-Policy Advice

Actor-critic algorithms learn an explicit policy (actor), and an accompanying value function (critic). The actor performs actions in the environment, while the critic evaluates the actor's current policy. However, despite their stability…

Artificial Intelligence · Computer Science 2019-02-08 Hélène Plisnier , Denis Steckelmacher , Diederik M. Roijers , Ann Nowé

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using…

Artificial Intelligence · Computer Science 2015-12-15 Lucas Lehnert , Doina Precup

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Generalized Off-Policy Actor-Critic

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which…

Machine Learning · Computer Science 2019-10-29 Shangtong Zhang , Wendelin Boehmer , Shimon Whiteson

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to…

Machine Learning · Computer Science 2021-08-20 Andrea Zanette , Martin J. Wainwright , Emma Brunskill

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally…

Machine Learning · Computer Science 2024-05-30 Yu Luo , Tianying Ji , Fuchun Sun , Jianwei Zhang , Huazhe Xu , Xianyuan Zhan

Cautious Actor-Critic

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a…

Machine Learning · Computer Science 2021-10-06 Lingwei Zhu , Toshinori Kitamura , Takamitsu Matsubara