Related papers: Variance Adjusted Actor Critic Algorithms

Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation

Several recent works have focused on carrying out non-asymptotic convergence analyses for AC algorithms. Recently, a two-timescale critic-actor algorithm has been presented for the discounted cost setting in the look-up table case where the…

Machine Learning · Computer Science 2025-09-01 Prashansa Panda , Shalabh Bhatnagar

On the Convergence of Single-Timescale Actor-Critic

We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework…

Machine Learning · Computer Science 2025-06-05 Navdeep Kumar , Priyank Agrawal , Giorgia Ramponi , Kfir Yehuda Levy , Shie Mannor

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common…

Machine Learning · Computer Science 2015-03-19 Prashanth L. A. , Mohammad Ghavamzadeh

Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes

Although actor-critic methods have been successful in practice, their theoretical analyses have several limitations. Specifically, existing theoretical work either sidesteps the exploration problem by making strong assumptions or analyzes…

Machine Learning · Computer Science 2026-04-02 Max Qiushi Lin , Reza Asad , Kevin Tan , Haque Ishfaq , Csaba Szepesvari , Sharan Vaswani

DAC: The Double Actor-Critic Architecture for Learning Options

We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master…

Machine Learning · Computer Science 2019-09-12 Shangtong Zhang , Shimon Whiteson

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is…

Optimization and Control · Mathematics 2025-01-27 Mo Zhou , Jianfeng Lu

A Convergence Result for Regularized Actor-Critic Methods

In this paper, we present a probability one convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic…

Machine Learning · Computer Science 2019-10-23 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Ji Liu

Compatible Gradient Approximations for Actor-Critic Algorithms

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

A Convergent Online Single Time Scale Actor Critic Algorithm

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological…

Machine Learning · Computer Science 2009-09-17 D. Di Castro , R. Meir

A constrained optimization perspective on actor critic algorithms and application to network routing

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear…

Machine Learning · Computer Science 2015-07-30 Prashanth L. A. , H. L. Prasad , Shalabh Bhatnagar , Prakash Chandra

Variational Actor-Critic Algorithms

We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the…

Machine Learning · Computer Science 2023-01-18 Yuhua Zhu , Lexing Ying

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence…

Machine Learning · Computer Science 2019-07-16 Zhuoran Yang , Yongxin Chen , Mingyi Hong , Zhaoran Wang

Refined Analysis of Entropy-Regularized Actor-Critic

In this paper, we study the role of the critic in actor--critic for entropy-regularized, finite, discounted environments. We establish that, when the critic is exact, using the latter as a baseline is a variance-reduction method in a strong…

Machine Learning · Computer Science 2026-05-26 Safwan Labbi , Paul Mangold , Daniil Tiapkin , Eric Moulines

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying…

Optimization and Control · Mathematics 2024-08-27 Sihan Zeng , Thinh T. Doan , Justin Romberg

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observable Markov decision processes (POMDP) with a special class of structured policies that are finite-state controllers. We show that the gradient estimation can be done in…

Machine Learning · Computer Science 2012-07-09 Huizhen Yu

A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP

Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linear…

Machine Learning · Computer Science 2025-03-13 Tejaram Sangadi , L. A. Prashanth , Krishna Jagannathan

Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation

Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning, where the critic uses algorithms, e.g., temporal difference (TD) learning with function approximation, to evaluate the current policy and the…

Machine Learning · Computer Science 2024-06-05 Yudan Wang , Yue Wang , Yi Zhou , Shaofeng Zou

Convergence of actor-critic for entropy regularised MDPs in general action spaces

We prove the stability and global convergence of a coupled actor-critic gradient flow for infinite-horizon and entropy-regularised Markov decision processes (MDPs) in continuous state and action space with linear function approximation…

Optimization and Control · Mathematics 2025-10-17 Denis Zorba , David Šiška , Lukasz Szpruch