Related papers: TD-Regularized Actor-Critic Methods

Double Actor-Critic with TD Error-Driven Regularization in Reinforcement Learning

To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with…

Machine Learning · Computer Science 2024-10-01 Haohui Chen , Zhiyong Chen , Aoxiang Liu , Wentuo Fang

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning

We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method. It aims at reducing both over and under-estimation errors.…

Machine Learning · Computer Science 2023-11-08 Junmin Zhong , Ruofan Wu , Jennie Si

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning

Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably. Ideally, agents should learn and execute…

Machine Learning · Computer Science 2022-10-12 Yuchen Xiao , Weihao Tan , Christopher Amato

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

Deterministic policy gradient algorithms for continuous control suffer from value estimation biases that degrade performance. While double critics reduce such biases, the exploration potential of double actors remains underexplored.…

Machine Learning · Computer Science 2025-11-21 Haohui Chen , Zhiyong Chen , Aoxiang Liu , Wentuo Fang

Guide Actor-Critic for Continuous Control

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only…

Machine Learning · Statistics 2018-02-23 Voot Tangkaratt , Abbas Abdolmaleki , Masashi Sugiyama

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning…

Machine Learning · Computer Science 2022-10-11 Yue Wu , Weitong Zhang , Pan Xu , Quanquan Gu

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work…

Machine Learning · Computer Science 2020-09-21 Sina Ghiassian , Andrew Patterson , Shivam Garg , Dhawal Gupta , Adam White , Martha White

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in…

Machine Learning · Computer Science 2022-02-24 Anas Barakat , Pascal Bianchi , Julien Lehmann

Actor-Critic Reinforcement Learning with Phased Actor

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness…

Machine Learning · Computer Science 2024-04-19 Ruofan Wu , Junmin Zhong , Jennie Si

Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy…

Machine Learning · Computer Science 2026-01-05 Uğurcan Özalp

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the…

Machine Learning · Computer Science 2020-11-03 Wei Zhou , Yiying Li , Yongxin Yang , Huaimin Wang , Timothy M. Hospedales

Efficient Continuous Control with Double Actors and Regularized Critics

How to obtain good value estimation is one of the key problems in Reinforcement Learning (RL). Current value estimation methods, such as DDPG and TD3, suffer from unnecessary over- or underestimation bias. In this paper, we explore the…

Machine Learning · Computer Science 2021-06-08 Jiafei Lyu , Xiaoteng Ma , Jiangpeng Yan , Xiu Li

Multi-State TD Target for Model-Free Reinforcement Learning

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by…

Machine Learning · Computer Science 2024-08-05 Wuhao Wang , Zhiyong Chen , Lepeng Zhang

Mirror descent actor-critic methods for entropy regularised MDPs in general spaces: stability and convergence

We provide theoretical guarantees for convergence of discrete-time policy mirror descent with inexact advantage functions updated using temporal difference (TD) learning for entropy regularised MDPs in Polish state and action spaces. We…

Optimization and Control · Mathematics 2026-02-12 Denis Zorba , David Šiška , Lukasz Szpruch

Optimistic critics can empower small actors

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use symmetric architectures, whereby both actor and critic have the same network topology and number of…

Machine Learning · Computer Science 2025-08-15 Olya Mastikhina , Dhruv Sreenivas , Pablo Samuel Castro

A Convergent Online Single Time Scale Actor Critic Algorithm

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological…

Machine Learning · Computer Science 2009-09-17 D. Di Castro , R. Meir

Decentralized Multi-Agent Actor-Critic with Generative Inference

Recent multi-agent actor-critic methods have utilized centralized training with decentralized execution to address the non-stationarity of co-adapting agents. This training paradigm constrains learning to the centralized phase such that…

Multiagent Systems · Computer Science 2019-10-09 Kevin Corder , Manuel M. Vindiola , Keith Decker