Related papers: Error Controlled Actor-Critic

Addressing Function Approximation Error in Actor-Critic Methods

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting…

Artificial Intelligence · Computer Science 2018-10-23 Scott Fujimoto , Herke van Hoof , David Meger

Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if…

Machine Learning · Computer Science 2021-12-28 Baturay Saglam , Enes Duran , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss

Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel…

Machine Learning · Computer Science 2025-04-15 Ukjo Hwang , Songnam Hong

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

A Theoretical Justification for Asymmetric Actor-Critic Algorithms

In reinforcement learning for partially observable environments, many successful algorithms have been developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for…

Machine Learning · Computer Science 2025-09-09 Gaspard Lambrechts , Damien Ernst , Aditya Mahajan

A Convergent Online Single Time Scale Actor Critic Algorithm

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological…

Machine Learning · Computer Science 2009-09-17 D. Di Castro , R. Meir

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence…

Machine Learning · Computer Science 2019-07-16 Zhuoran Yang , Yongxin Chen , Mingyi Hong , Zhaoran Wang

Actor-Critic Learning for Risk-Constrained Linear Quadratic Regulation

In this paper, we investigate the infinite-horizon risk-constrained linear quadratic regulator problem (RC-QR), which augments the classical LQR formulation with a statistical constraint on the variability of the system state to incorporate…

Optimization and Control · Mathematics 2025-10-28 Weijian Li , Andreas A. Malikopoulos

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy…

Machine Learning · Computer Science 2024-06-21 Shalabh Bhatnagar , Vivek S. Borkar , Soumyajit Guin

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating…

Machine Learning · Computer Science 2025-10-23 Yigit Korkmaz , Urvi Bhuwania , Ayush Jain , Erdem Bıyık

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best design from the feasible space. Although the simulation model can be queried with finite samples, its internal processing rule cannot be utilized in the optimization…

Machine Learning · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia , Jiaqi Yan

Compatible Gradient Approximations for Actor-Critic Algorithms

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

Better Exploration with Optimistic Actor-Critic

Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in…

Machine Learning · Statistics 2019-10-29 Kamil Ciosek , Quan Vuong , Robert Loftin , Katja Hofmann

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start…

Machine Learning · Computer Science 2023-06-21 Hang Wang , Sen Lin , Junshan Zhang

PAC-Bayesian Soft Actor-Critic Learning

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused…

Machine Learning · Computer Science 2024-06-11 Bahareh Tasdighi , Abdullah Akgül , Manuel Haussmann , Kenny Kazimirzak Brink , Melih Kandemir

Actor-Critics Can Achieve Optimal Sample Efficiency

Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work…

Machine Learning · Statistics 2025-05-07 Kevin Tan , Wei Fan , Yuting Wei

An Actor Critic Method for Free Terminal Time Optimal Control

Optimal control problems with free terminal time present many challenges including nonsmooth and discontinuous control laws, irregular value functions, many local optima, and the curse of dimensionality. To overcome these issues, we propose…

Optimization and Control · Mathematics 2022-08-08 Evan Burton , Tenavi Nakamura-Zimmerer , Qi Gong , Wei Kang

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is…

Optimization and Control · Mathematics 2025-01-27 Mo Zhou , Jianfeng Lu

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro