Related papers: An Actor-Critic Algorithm with Function Approximat…

Risk-Sensitive Reinforcement Learning with Exponential Criteria

While reinforcement learning has shown experimental success in a number of applications, it is known to be sensitive to noise and perturbations in the parameters of the system, leading to high variance in the total reward amongst different…

Systems and Control · Electrical Eng. & Systems 2024-12-02 Erfaun Noorani , Christos Mavridis , John Baras

Compatible Gradient Approximations for Actor-Critic Algorithms

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

Primal-Only Actor Critic Algorithm for Robust Constrained Average Cost MDPs

In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of…

Machine Learning · Computer Science 2025-11-11 Anirudh Satheesh , Sooraj Sathish , Swetha Ganesh , Keenan Powell , Vaneet Aggarwal

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common…

Machine Learning · Computer Science 2015-03-19 Prashanth L. A. , Mohammad Ghavamzadeh

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

We consider the estimation of the policy gradient in partially observable Markov decision processes (POMDP) with a special class of structured policies that are finite-state controllers. We show that the gradient estimation can be done in…

Machine Learning · Computer Science 2012-07-09 Huizhen Yu

Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes

Although actor-critic methods have been successful in practice, their theoretical analyses have several limitations. Specifically, existing theoretical work either sidesteps the exploration problem by making strong assumptions or analyzes…

Machine Learning · Computer Science 2026-04-02 Max Qiushi Lin , Reza Asad , Kevin Tan , Haque Ishfaq , Csaba Szepesvari , Sharan Vaswani

A constrained optimization perspective on actor critic algorithms and application to network routing

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear…

Machine Learning · Computer Science 2015-07-30 Prashanth L. A. , H. L. Prasad , Shalabh Bhatnagar , Prakash Chandra

Markov Decision Processes with Risk-Sensitive Criteria: An Overview

The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term 'risk-sensitive' refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk.…

Risk Management · Quantitative Finance 2025-09-23 Nicole Bäuerle , Anna Jaśkiewicz

Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity

Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular…

Machine Learning · Computer Science 2025-10-07 Prashansa Panda , Shalabh Bhatnagar

A Natural Actor-Critic Algorithm with Downside Risk Constraints

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also…

Machine Learning · Computer Science 2020-07-09 Thomas Spooner , Rahul Savani

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes

We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities.…

Optimization and Control · Mathematics 2024-11-21 Sihan Zeng , Thinh T. Doan , Justin Romberg

Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning under Distribution Shifts

We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage stochastic combinatorial optimization problems from the operations research domain. In this context, risk-sensitive…

Machine Learning · Computer Science 2024-02-16 Tobias Enders , James Harrison , Maximilian Schiffer

Risk-Sensitive Partially Observable Markov Decision Processes as Fully Observable Multivariate Utility Optimization problems

We provide a new algorithm for solving Risk Sensitive Partially Observable Markov Decisions Processes, when the risk is modeled by a utility function, and both the state space and the space of observations is finite. This algorithm is based…

Optimization and Control · Mathematics 2022-07-19 Arsham Afsardeir , Andreas Kapetanis , Vaios Laschos , Klaus Obermayer

Risk Conditioned Neural Motion Planning

Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. While existing mathematical programming methods offer theoretical guarantees in the context of constrained Markov decision processes, they either…

Machine Learning · Computer Science 2021-08-05 Xin Huang , Meng Feng , Ashkan Jasour , Guy Rosman , Brian Williams

On the function approximation error for risk-sensitive reinforcement learning

In this paper we obtain several informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to find the risk-sensitive cost represented using exponential utility. The main…

Machine Learning · Computer Science 2019-10-23 Prasenjit Karmakar , Shalabh Bhatnagar

Critic Algorithms using Cooperative Networks

An algorithm is proposed for policy evaluation in Markov Decision Processes which gives good empirical results with respect to convergence rates. The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based…

Artificial Intelligence · Computer Science 2022-01-21 Debangshu Banerjee , Kavita Wagh

Continuous-time Markov decision processes under the risk-sensitive average cost criterion

This paper studies continuous-time Markov decision processes under the risk-sensitive average cost criterion. The state space is a finite set, the action space is a Borel space, the cost and transition rates are bounded, and the…

Optimization and Control · Mathematics 2015-12-22 Qingda Wei , Xian Chen

Verification of Markov Decision Processes with Risk-Sensitive Measures

We develop a method for computing policies in Markov decision processes with risk-sensitive measures subject to temporal logic constraints. Specifically, we use a particular risk-sensitive measure from cumulative prospect theory, which has…

Artificial Intelligence · Computer Science 2020-04-21 Murat Cubuktepe , Ufuk Topcu