Related papers: Multi-Preference Actor Critic

Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average…

Optimization and Control · Mathematics 2024-05-07 Sihan Zeng , Thinh T. Doan , Justin Romberg

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning

In cooperative stochastic games multiple agents work towards learning joint optimal actions in an unknown environment to achieve a common goal. In many real-world applications, however, constraints are often imposed on the actions that can…

Multiagent Systems · Computer Science 2020-07-14 Raghuram Bharadwaj Diddigi , Sai Koti Reddy Danda , Prabuchandran K. J. , Shalabh Bhatnagar

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient…

Machine Learning · Statistics 2018-05-24 Cameron Allen , Kavosh Asadi , Melrose Roderick , Abdel-rahman Mohamed , George Konidaris , Michael Littman

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in…

Machine Learning · Computer Science 2019-05-29 Shariq Iqbal , Fei Sha

Actor-Critic Reinforcement Learning with Phased Actor

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness…

Machine Learning · Computer Science 2024-04-19 Ruofan Wu , Junmin Zhong , Jennie Si

PAC-Bayesian Soft Actor-Critic Learning

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused…

Machine Learning · Computer Science 2024-06-11 Bahareh Tasdighi , Abdullah Akgül , Manuel Haussmann , Kenny Kazimirzak Brink , Melih Kandemir

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

Predictive Lagrangian Optimization for Constrained Reinforcement Learning

Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal…

Machine Learning · Computer Science 2025-01-28 Tianqi Zhang , Puzhen Yuan , Guojian Zhan , Ziyu Lin , Yao Lyu , Zhenzhi Qin , Jingliang Duan , Liping Zhang , Shengbo Eben Li

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common…

Machine Learning · Computer Science 2015-03-19 Prashanth L. A. , Mohammad Ghavamzadeh

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function…

Machine Learning · Computer Science 2020-06-15 Sriram Srinivasan , Marc Lanctot , Vinicius Zambaldi , Julien Perolat , Karl Tuyls , Remi Munos , Michael Bowling

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given…

Machine Learning · Computer Science 2022-07-26 Yanwei Jia , Xun Yu Zhou

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially…

Robotics · Computer Science 2019-02-14 Steven Bohez , Abbas Abdolmaleki , Michael Neunert , Jonas Buchli , Nicolas Heess , Raia Hadsell

Bayesian policy gradient and actor-critic algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which…

Machine Learning · Computer Science 2026-05-01 Mohammad Ghavamzadeh , Yaakov Engel , Michal Valko

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Learning Value Functions in Deep Policy Gradients using Residual Variance

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing…

Machine Learning · Computer Science 2021-03-17 Yannis Flet-Berliac , Reda Ouhamma , Odalric-Ambrym Maillard , Philippe Preux

DAC: The Double Actor-Critic Architecture for Learning Options

We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master…

Machine Learning · Computer Science 2019-09-12 Shangtong Zhang , Shimon Whiteson

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment,…

Machine Learning · Computer Science 2020-03-17 Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , Pieter Abbeel , Igor Mordatch

Characterizing the Gap Between Actor-Critic and Policy Gradient

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper,…

Artificial Intelligence · Computer Science 2021-06-15 Junfeng Wen , Saurabh Kumar , Ramki Gummadi , Dale Schuurmans

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking…

Artificial Intelligence · Computer Science 2018-09-18 Jun Feng , Heng Li , Minlie Huang , Shichen Liu , Wenwu Ou , Zhirong Wang , Xiaoyan Zhu