English
Related papers

Related papers: An Actor-Critic Algorithm with Function Approximat…

200 papers

While reinforcement learning has shown experimental success in a number of applications, it is known to be sensitive to noise and perturbations in the parameters of the system, leading to high variance in the total reward amongst different…

Systems and Control · Electrical Eng. & Systems 2024-12-02 Erfaun Noorani , Christos Mavridis , John Baras

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

In this work, we study the problem of finding robust and safe policies in Robust Constrained Average-Cost Markov Decision Processes (RCMDPs). A key challenge in this setting is the lack of strong duality, which prevents the direct use of…

Machine Learning · Computer Science 2025-11-11 Anirudh Satheesh , Sooraj Sathish , Swetha Ganesh , Keenan Powell , Vaneet Aggarwal

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common…

Machine Learning · Computer Science 2015-03-19 Prashanth L. A. , Mohammad Ghavamzadeh

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

We consider the estimation of the policy gradient in partially observable Markov decision processes (POMDP) with a special class of structured policies that are finite-state controllers. We show that the gradient estimation can be done in…

Machine Learning · Computer Science 2012-07-09 Huizhen Yu

Although actor-critic methods have been successful in practice, their theoretical analyses have several limitations. Specifically, existing theoretical work either sidesteps the exploration problem by making strong assumptions or analyzes…

Machine Learning · Computer Science 2026-04-02 Max Qiushi Lin , Reza Asad , Kevin Tan , Haque Ishfaq , Csaba Szepesvari , Sharan Vaswani

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear…

Machine Learning · Computer Science 2015-07-30 Prashanth L. A. , H. L. Prasad , Shalabh Bhatnagar , Prakash Chandra

The paper provides an overview of the theory and applications of risk-sensitive Markov decision processes. The term 'risk-sensitive' refers here to the use of the Optimized Certainty Equivalent as a means to measure expectation and risk.…

Risk Management · Quantitative Finance 2025-09-23 Nicole Bäuerle , Anna Jaśkiewicz

Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular…

Machine Learning · Computer Science 2025-10-07 Prashansa Panda , Shalabh Bhatnagar

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also…

Machine Learning · Computer Science 2020-07-09 Thomas Spooner , Rahul Savani

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities.…

Optimization and Control · Mathematics 2024-11-21 Sihan Zeng , Thinh T. Doan , Justin Romberg

We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage stochastic combinatorial optimization problems from the operations research domain. In this context, risk-sensitive…

Machine Learning · Computer Science 2024-02-16 Tobias Enders , James Harrison , Maximilian Schiffer

We provide a new algorithm for solving Risk Sensitive Partially Observable Markov Decisions Processes, when the risk is modeled by a utility function, and both the state space and the space of observations is finite. This algorithm is based…

Optimization and Control · Mathematics 2022-07-19 Arsham Afsardeir , Andreas Kapetanis , Vaios Laschos , Klaus Obermayer

Risk-bounded motion planning is an important yet difficult problem for safety-critical tasks. While existing mathematical programming methods offer theoretical guarantees in the context of constrained Markov decision processes, they either…

Machine Learning · Computer Science 2021-08-05 Xin Huang , Meng Feng , Ashkan Jasour , Guy Rosman , Brian Williams

In this paper we obtain several informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to find the risk-sensitive cost represented using exponential utility. The main…

Machine Learning · Computer Science 2019-10-23 Prasenjit Karmakar , Shalabh Bhatnagar

An algorithm is proposed for policy evaluation in Markov Decision Processes which gives good empirical results with respect to convergence rates. The algorithm tracks the Projected Bellman Error and is implemented as a true gradient based…

Artificial Intelligence · Computer Science 2022-01-21 Debangshu Banerjee , Kavita Wagh

This paper studies continuous-time Markov decision processes under the risk-sensitive average cost criterion. The state space is a finite set, the action space is a Borel space, the cost and transition rates are bounded, and the…

Optimization and Control · Mathematics 2015-12-22 Qingda Wei , Xian Chen

We develop a method for computing policies in Markov decision processes with risk-sensitive measures subject to temporal logic constraints. Specifically, we use a particular risk-sensitive measure from cumulative prospect theory, which has…

Artificial Intelligence · Computer Science 2020-04-21 Murat Cubuktepe , Ufuk Topcu
‹ Prev 1 2 3 10 Next ›