English
Related papers

Related papers: Multi-Preference Actor Critic

200 papers

Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average…

Optimization and Control · Mathematics 2024-05-07 Sihan Zeng , Thinh T. Doan , Justin Romberg

In cooperative stochastic games multiple agents work towards learning joint optimal actions in an unknown environment to achieve a common goal. In many real-world applications, however, constraints are often imposed on the actions that can…

Multiagent Systems · Computer Science 2020-07-14 Raghuram Bharadwaj Diddigi , Sai Koti Reddy Danda , Prabuchandran K. J. , Shalabh Bhatnagar

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient…

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in…

Machine Learning · Computer Science 2019-05-29 Shariq Iqbal , Fei Sha

Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness…

Machine Learning · Computer Science 2024-04-19 Ruofan Wu , Junmin Zhong , Jennie Si

Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused…

Machine Learning · Computer Science 2024-06-11 Bahareh Tasdighi , Abdullah Akgül , Manuel Haussmann , Kenny Kazimirzak Brink , Melih Kandemir

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps…

Machine Learning · Computer Science 2023-01-31 Harshat Kumar , Alec Koppel , Alejandro Ribeiro

Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal…

Machine Learning · Computer Science 2025-01-28 Tianqi Zhang , Puzhen Yuan , Guojian Zhan , Ziyu Lin , Yao Lyu , Zhenzhi Qin , Jingliang Duan , Liping Zhang , Shengbo Eben Li

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common…

Machine Learning · Computer Science 2015-03-19 Prashanth L. A. , Mohammad Ghavamzadeh

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function…

Machine Learning · Computer Science 2020-06-15 Sriram Srinivasan , Marc Lanctot , Vinicius Zambaldi , Julien Perolat , Karl Tuyls , Remi Munos , Michael Bowling

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given…

Machine Learning · Computer Science 2022-07-26 Yanwei Jia , Xun Yu Zhou

The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially…

Robotics · Computer Science 2019-02-14 Steven Bohez , Abbas Abdolmaleki , Michael Neunert , Jonas Buchli , Nicolas Heess , Raia Hadsell

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which…

Machine Learning · Computer Science 2026-05-01 Mohammad Ghavamzadeh , Yaakov Engel , Michal Valko

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the…

Machine Learning · Computer Science 2023-11-01 Sharan Vaswani , Amirreza Kazemi , Reza Babanezhad , Nicolas Le Roux

Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing…

Machine Learning · Computer Science 2021-03-17 Yannis Flet-Berliac , Reda Ouhamma , Odalric-Ambrym Maillard , Philippe Preux

We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master…

Machine Learning · Computer Science 2019-09-12 Shangtong Zhang , Shimon Whiteson

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment,…

Machine Learning · Computer Science 2020-03-17 Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , Pieter Abbeel , Igor Mordatch

Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper,…

Artificial Intelligence · Computer Science 2021-06-15 Junfeng Wen , Saurabh Kumar , Ramki Gummadi , Dale Schuurmans

Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking…

Artificial Intelligence · Computer Science 2018-09-18 Jun Feng , Heng Li , Minlie Huang , Shichen Liu , Wenwu Ou , Zhirong Wang , Xiaoyan Zhu
‹ Prev 1 2 3 10 Next ›