Related papers: Variational Actor-Critic Algorithms

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman…

Machine Learning · Computer Science 2021-10-07 Ting-Han Fan , Peter J. Ramadge

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to…

Machine Learning · Computer Science 2021-08-20 Andrea Zanette , Martin J. Wainwright , Emma Brunskill

Learning Value Functions in Deep Policy Gradients using Residual Variance

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing…

Machine Learning · Computer Science 2021-03-17 Yannis Flet-Berliac , Reda Ouhamma , Odalric-Ambrym Maillard , Philippe Preux

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the…

Machine Learning · Computer Science 2022-12-29 Tim G. J. Rudner , Vitchyr H. Pong , Rowan McAllister , Yarin Gal , Sergey Levine

Compatible Gradient Approximations for Actor-Critic Algorithms

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with…

Machine Learning · Computer Science 2025-02-11 Baturay Saglam , Dionysis Kalogerias

A Natural Actor-Critic Algorithm with Downside Risk Constraints

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also…

Machine Learning · Computer Science 2020-07-09 Thomas Spooner , Rahul Savani

Global Optimization for Value Function Approximation

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation,…

Artificial Intelligence · Computer Science 2010-06-15 Marek Petrik , Shlomo Zilberstein

Boosting the Actor with Dual Critic

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC. It is derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, which can be viewed as a two-player game between…

Machine Learning · Computer Science 2018-01-01 Bo Dai , Albert Shaw , Niao He , Lihong Li , Le Song

Variance Adjusted Actor Critic Algorithms

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We…

Machine Learning · Statistics 2013-10-15 Aviv Tamar , Shie Mannor

Error Controlled Actor-Critic

On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic…

Machine Learning · Computer Science 2021-09-08 Xingen Gao , Fei Chao , Changle Zhou , Zhen Ge , Chih-Min Lin , Longzhi Yang , Xiang Chang , Changjing Shang

Actor-Critics Can Achieve Optimal Sample Efficiency

Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work…

Machine Learning · Statistics 2025-05-07 Kevin Tan , Wei Fan , Yuting Wei

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value…

Machine Learning · Statistics 2017-03-14 Yemi Okesanjo , Victor Kofia

Towards an Unified Structure for Reinforcement Learning: an Optimization Approach

Both the optimal value function and the optimal policy can be used to model an optimal controller based on the duality established by the Bellman equation. Even with this duality, no parametric model has been able to output both policy and…

Systems and Control · Electrical Eng. & Systems 2020-06-02 Jicheng Shi , Yingzhao Lian , Colin N. Jones

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. While most existing works on actor-critic employ bi-level or two-timescale updates, we focus on…

Machine Learning · Computer Science 2021-06-15 Zuyue Fu , Zhuoran Yang , Zhaoran Wang

Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning

We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage. Our algorithm combines the marginalized…

Machine Learning · Computer Science 2023-10-10 Hanlin Zhu , Paria Rashidinejad , Jiantao Jiao

On the Second-Order Convergence of Biased Policy Gradient Algorithms

Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points. Existing…

Machine Learning · Computer Science 2024-05-15 Siqiao Mu , Diego Klabjan

Adaptive Bases for Reinforcement Learning

We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function…

Machine Learning · Computer Science 2010-05-04 Dotan Di Castro , Shie Mannor