Related papers: Guide Actor-Critic for Continuous Control

Distributional Policy Optimization: An Alternative Approach for Continuous Control

We identify a fundamental problem in policy gradient-based methods in continuous control. As policy gradient methods require the agent's underlying probability distribution, they limit policy representation to parametric distribution…

Machine Learning · Computer Science 2019-11-26 Chen Tessler , Guy Tennenholtz , Shie Mannor

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Adversarially Guided Actor-Critic

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These…

Machine Learning · Computer Science 2021-02-09 Yannis Flet-Berliac , Johan Ferret , Olivier Pietquin , Philippe Preux , Matthieu Geist

Actor-Critic Reinforcement Learning with Phased Actor

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness…

Machine Learning · Computer Science 2024-04-19 Ruofan Wu , Junmin Zhong , Jennie Si

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper,…

Artificial Intelligence · Computer Science 2021-06-15 Junfeng Wen , Saurabh Kumar , Ramki Gummadi , Dale Schuurmans

Generative Actor Critic

Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor…

Machine Learning · Computer Science 2025-12-29 Aoyang Qin , Deqian Kong , Wei Wang , Ying Nian Wu , Song-Chun Zhu , Sirui Xie

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient…

Machine Learning · Statistics 2018-05-24 Cameron Allen , Kavosh Asadi , Melrose Roderick , Abdel-rahman Mohamed , George Konidaris , Michael Littman

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results…

Machine Learning · Computer Science 2023-06-05 Lingwei Peng , Hui Qian , Zhebang Shen , Chao Zhang , Fei Li

Adviser-Actor-Critic: Eliminating Steady-State Error in Reinforcement Learning Control

High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These…

Machine Learning · Computer Science 2025-02-05 Donghe Chen , Yubin Peng , Tengjie Zheng , Han Wang , Chaoran Qu , Lin Cheng

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

Potential Field Guided Actor-Critic Reinforcement Learning

In this paper, we consider the problem of actor-critic reinforcement learning. Firstly, we extend the actor-critic architecture to actor-critic-N architecture by introducing more critics beyond rewards. Secondly, we combine the reward-based…

Machine Learning · Computer Science 2020-06-15 Weiya Ren

Quasi-Newton Compatible Actor-Critic for Deterministic Policies

In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building…

Machine Learning · Computer Science 2025-11-13 Arash Bahari Kordabad , Dean Brandner , Sebastien Gros , Sergio Lucia , Sadegh Soudjani

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial…

Computation and Language · Computer Science 2018-02-09 Baolin Peng , Xiujun Li , Jianfeng Gao , Jingjing Liu , Yun-Nung Chen , Kam-Fai Wong

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the…

Artificial Intelligence · Computer Science 2020-10-23 Pierluca D'Oro , Wojciech Jaśkowski

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-policy actor-critic deep RL algorithm. BAC mathematically formulates the behavior of the policy through autoencoders by providing an accurate estimation of how frequently…

Machine Learning · Computer Science 2021-04-12 Ammar Fayad , Majd Ibrahim

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the…

Machine Learning · Computer Science 2020-11-03 Wei Zhou , Yiying Li , Yongxin Yang , Huaimin Wang , Timothy M. Hospedales

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform…

Machine Learning · Computer Science 2022-07-20 Mohammadi Zaki , Avinash Mohan , Aditya Gopalan , Shie Mannor

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

TD-Regularized Actor-Critic Methods

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an…

Machine Learning · Computer Science 2019-02-26 Simone Parisi , Voot Tangkaratt , Jan Peters , Mohammad Emtiyaz Khan

Boosting the Actor with Dual Critic

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC. It is derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, which can be viewed as a two-player game between…

Machine Learning · Computer Science 2018-01-01 Bo Dai , Albert Shaw , Niao He , Lihong Li , Le Song