Related papers: Neural Replicator Dynamics

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory. The NeuRD expected update is designed to be nearly identical to that…

Machine Learning · Computer Science 2022-06-07 Dustin Morrill , Esra'a Saleh , Michael Bowling , Amy Greenwald

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function…

Machine Learning · Computer Science 2020-06-15 Sriram Srinivasan , Marc Lanctot , Vinicius Zambaldi , Julien Perolat , Karl Tuyls , Remi Munos , Michael Bowling

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We…

Computer Science and Game Theory · Computer Science 2020-06-23 Rui Yan , Xiaoming Duan , Zongying Shi , Yisheng Zhong , Jason R. Marden , Francesco Bullo

On Gradient-Based Learning in Continuous Games

We formulate a general framework for competitive gradient-based learning that encompasses a wide breadth of multi-agent learning algorithms, and analyze the limiting behavior of competitive gradient-based learning algorithms using dynamical…

Machine Learning · Computer Science 2020-02-21 Eric Mazumdar , Lillian J. Ratliff , S. Shankar Sastry

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent…

Machine Learning · Computer Science 2022-03-16 Jialian Li , Tongzheng Ren , Dong Yan , Hang Su , Jun Zhu

Solving Robust MDPs through No-Regret Dynamics

Reinforcement Learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. However, solving Markov Decision Processes that are robust to changes is…

Machine Learning · Computer Science 2024-06-21 Etash Kumar Guha

Recurrent Natural Policy Gradient for POMDPs

Solving partially observable Markov decision processes (POMDPs) remains a fundamental challenge in reinforcement learning (RL), primarily due to the curse of dimensionality induced by the non-stationarity of optimal policies. In this work,…

Optimization and Control · Mathematics 2025-10-20 Semih Cayci , Atilla Eryilmaz

Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum…

Machine Learning · Computer Science 2019-12-18 Eric Mazumdar , Lillian J. Ratliff , Michael I. Jordan , S. Shankar Sastry

Experience-replay Innovative Dynamics

Despite its groundbreaking success, multi-agent reinforcement learning (MARL) still suffers from instability and nonstationarity. Replicator dynamics, the most well-known model from evolutionary game theory (EGT), provide a theoretical…

Machine Learning · Computer Science 2025-01-28 Tuo Zhang , Leonardo Stella , Julian Barreiro-Gomez

Optimizing for the Future in Non-Stationary MDPs

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this…

Machine Learning · Computer Science 2020-09-23 Yash Chandak , Georgios Theocharous , Shiv Shankar , Martha White , Sridhar Mahadevan , Philip S. Thomas

Neural Robot Dynamics

Accurate and efficient simulation of modern robots remains challenging due to their high degrees of freedom and intricate mechanisms. Neural simulators have emerged as a promising alternative to traditional analytical simulators, capable of…

Robotics · Computer Science 2025-08-22 Jie Xu , Eric Heiden , Iretiayo Akinola , Dieter Fox , Miles Macklin , Yashraj Narang

Minimax Iterative Dynamic Game: Application to Nonlinear Robot Control Tasks

Multistage decision policies provide useful control strategies in high-dimensional state spaces, particularly in complex control tasks. However, they exhibit weak performance guarantees in the presence of disturbance, model mismatch, or…

Robotics · Computer Science 2018-08-07 Olalekan Ogunmolu , Nicholas Gans , Tyler Summers

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively…

Machine Learning · Computer Science 2021-06-15 Dong-Ki Kim , Miao Liu , Matthew Riemer , Chuangchuang Sun , Marwa Abdulhai , Golnaz Habibi , Sebastian Lopez-Cot , Gerald Tesauro , Jonathan P. How

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

In this paper, we study the global convergence of model-based and model-free policy gradient descent and natural policy gradient descent algorithms for linear quadratic deep structured teams. In such systems, agents are partitioned into a…

Multiagent Systems · Computer Science 2020-12-16 Vida Fathi , Jalal Arabneydi , Amir G. Aghdam

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main…

Machine Learning · Computer Science 2018-10-23 John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan , Pieter Abbeel

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of…

Optimization and Control · Mathematics 2023-03-21 Sarath Pattathil , Kaiqing Zhang , Asuman Ozdaglar

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various…

Machine Learning · Computer Science 2023-01-16 Zaiwei Chen , Siva Theja Maguluri

Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator

Policy-gradient methods are widely used in reinforcement learning, yet training often becomes unstable or slows down as learning progresses. We study this phenomenon through the noise-to-signal ratio (NSR) of a policy-gradient estimator,…

Optimization and Control · Mathematics 2026-02-10 Haoyu Han , Heng Yang

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…

Machine Learning · Computer Science 2020-10-19 Santiago Paternain , Juan Andres Bazerque , Alejandro Ribeiro

Elementary Analysis of Policy Gradient Methods

Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent…

Optimization and Control · Mathematics 2024-04-12 Jiacai Liu , Wenye Li , Ke Wei