Related papers: Value Improved Actor Critic Algorithms

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay

Background: Deep Deterministic Policy Gradient-based reinforcement learning algorithms utilize Actor-Critic architectures, where both networks are typically trained using identical batches of replayed transitions. However, the learning…

Machine Learning · Computer Science 2025-12-08 Mehmet Efe Lorasdagi , Dogan Can Cicek , Furkan Burak Mutlu , Suleyman Serdar Kozat

Learning Value Functions in Deep Policy Gradients using Residual Variance

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing…

Machine Learning · Computer Science 2021-03-17 Yannis Flet-Berliac , Reda Ouhamma , Odalric-Ambrym Maillard , Philippe Preux

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the…

Machine Learning · Computer Science 2023-03-02 Samuel Neumann , Sungsu Lim , Ajin Joseph , Yangchen Pan , Adam White , Martha White

Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

In this paper, we propose actor-director-critic, a new framework for deep reinforcement learning. Compared with the actor-critic framework, the director role is added, and action classification and action evaluation are applied…

Machine Learning · Computer Science 2023-01-11 Zongwei Liu , Yonghong Song , Yuanlin Zhang

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep…

Artificial Intelligence · Computer Science 2017-10-04 Xiangxiang Chu , Hangjun Ye

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

Actor-Critic Reinforcement Learning with Phased Actor

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness…

Machine Learning · Computer Science 2024-04-19 Ruofan Wu , Junmin Zhong , Jennie Si

GRAC: Self-Guided and Self-Regularized Actor-Critic

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network…

Machine Learning · Computer Science 2020-11-12 Lin Shao , Yifan You , Mengyuan Yan , Qingyun Sun , Jeannette Bohg

RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning

On-policy deep reinforcement learning remains a dominant paradigm for continuous control, yet standard implementations rely on Gaussian actors and relatively shallow MLP policies, often leading to brittle optimization when gradients are…

Machine Learning · Computer Science 2026-02-02 Yuexin Bian , Jie Feng , Tao Wang , Yijiang Li , Sicun Gao , Yuanyuan Shi

Distributional Advantage Actor-Critic

In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action…

Machine Learning · Computer Science 2018-06-20 Shangda Li , Selina Bing , Steven Yang

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the…

Artificial Intelligence · Computer Science 2020-10-23 Pierluca D'Oro , Wojciech Jaśkowski

Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions

In reinforcement learning, off-policy actor-critic methods like DDPG and TD3 use deterministic policy gradients: the Q-function is learned from environment data, while the actor maximizes it via gradient ascent. We observe that in complex…

Machine Learning · Computer Science 2025-10-13 Ayush Jain , Norio Kosaka , Xinhu Li , Kyung-Min Kim , Erdem Bıyık , Joseph J. Lim

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the…

Artificial Intelligence · Computer Science 2018-02-12 Xiaoqin Zhang , Huimin Ma

Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning

This paper studies the networked multi-agent reinforcement learning (NMARL) problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer…

Multiagent Systems · Computer Science 2025-06-02 Pengcheng Dai , Yuanqiu Mo , Wenwu Yu , Wei Ren

Broad Critic Deep Actor Reinforcement Learning for Continuous Control

In the domain of continuous control, deep reinforcement learning (DRL) demonstrates promising results. However, the dependence of DRL on deep neural networks (DNNs) results in the demand for extensive data and increased computational cost.…

Machine Learning · Computer Science 2025-04-15 Shiron Thalagala , Pak Kin Wong , Xiaozheng Wang , Tianang Sun

Compatible Gradient Approximations for Actor-Critic Algorithms

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

Reinforcement Learning for Learning Rate Control

Stochastic gradient descent (SGD), which updates the model parameters by adding a local gradient times a learning rate at each step, is widely used in model training of machine learning algorithms such as neural networks. It is observed…

Machine Learning · Computer Science 2017-06-01 Chang Xu , Tao Qin , Gang Wang , Tie-Yan Liu