English
Related papers

Related papers: Variance Adjusted Actor Critic Algorithms

200 papers

Several recent works have focused on carrying out non-asymptotic convergence analyses for AC algorithms. Recently, a two-timescale critic-actor algorithm has been presented for the discounted cost setting in the look-up table case where the…

Machine Learning · Computer Science 2025-09-01 Prashansa Panda , Shalabh Bhatnagar

We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework…

Machine Learning · Computer Science 2025-06-05 Navdeep Kumar , Priyank Agrawal , Giorgia Ramponi , Kfir Yehuda Levy , Shie Mannor

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common…

Machine Learning · Computer Science 2015-03-19 Prashanth L. A. , Mohammad Ghavamzadeh

Although actor-critic methods have been successful in practice, their theoretical analyses have several limitations. Specifically, existing theoretical work either sidesteps the exploration problem by making strong assumptions or analyzes…

Machine Learning · Computer Science 2026-04-02 Max Qiushi Lin , Reza Asad , Kevin Tan , Haque Ishfaq , Csaba Szepesvari , Sharan Vaswani

We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master…

Machine Learning · Computer Science 2019-09-12 Shangtong Zhang , Shimon Whiteson

We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is…

Optimization and Control · Mathematics 2025-01-27 Mo Zhou , Jianfeng Lu

In this paper, we present a probability one convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic…

Machine Learning · Computer Science 2019-10-23 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Ji Liu

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological…

Machine Learning · Computer Science 2009-09-17 D. Di Castro , R. Meir

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear…

Machine Learning · Computer Science 2015-07-30 Prashanth L. A. , H. L. Prasad , Shalabh Bhatnagar , Prakash Chandra

We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the…

Machine Learning · Computer Science 2023-01-18 Yuhua Zhu , Lexing Ying

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence…

Machine Learning · Computer Science 2019-07-16 Zhuoran Yang , Yongxin Chen , Mingyi Hong , Zhaoran Wang

In this paper, we study the role of the critic in actor--critic for entropy-regularized, finite, discounted environments. We establish that, when the critic is exact, using the latter as a baseline is a variance-reduction method in a strong…

Machine Learning · Computer Science 2026-05-26 Safwan Labbi , Paul Mangold , Daniil Tiapkin , Eric Moulines

We study a new two-time-scale stochastic gradient method for solving optimization problems, where the gradients are computed with the aid of an auxiliary variable under samples generated by time-varying MDPs controlled by the underlying…

Optimization and Control · Mathematics 2024-08-27 Sihan Zeng , Thinh T. Doan , Justin Romberg

We consider the estimation of the policy gradient in partially observable Markov decision processes (POMDP) with a special class of structured policies that are finite-state controllers. We show that the gradient estimation can be done in…

Machine Learning · Computer Science 2012-07-09 Huizhen Yu

Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linear…

Machine Learning · Computer Science 2025-03-13 Tejaram Sangadi , L. A. Prashanth , Krishna Jagannathan

Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning, where the critic uses algorithms, e.g., temporal difference (TD) learning with function approximation, to evaluate the current policy and the…

Machine Learning · Computer Science 2024-06-05 Yudan Wang , Yue Wang , Yi Zhou , Shaofeng Zou

We prove the stability and global convergence of a coupled actor-critic gradient flow for infinite-horizon and entropy-regularised Markov decision processes (MDPs) in continuous state and action space with linear function approximation…

Optimization and Control · Mathematics 2025-10-17 Denis Zorba , David Šiška , Lukasz Szpruch
‹ Prev 1 2 3 10 Next ›