Related papers: Communication-Efficient Policy Gradient Methods fo…

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due…

Multiagent Systems · Computer Science 2022-12-06 Xiaoxiao Zhao , Jinlong Lei , Li Li , Jie Chen

The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication

The paper considers independent reinforcement learning (IRL) for multi-agent collaborative decision-making in the paradigm of federated learning (FL). However, FL generates excessive communication overheads between agents and a remote…

Machine Learning · Computer Science 2023-05-30 Xing Xu , Rongpeng Li , Zhifeng Zhao , Honggang Zhang

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence

Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications. While traditional RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks…

Machine Learning · Computer Science 2025-02-03 Minheng Xiao , Xian Yu , Lei Ying

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these…

Machine Learning · Computer Science 2026-05-21 Haitong Ma , Ofir Nabati , Aviv Rosenberg , Bo Dai , Oran Lang , Craig Boutilier , Na Li , Shie Mannor , Lior Shani , Guy Tenneholtz

Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning

We study distributed reinforcement learning (RL) with policy gradient methods under asynchronous and parallel computations and communications. While non-distributed methods are well understood theoretically and have achieved remarkable…

Machine Learning · Computer Science 2026-03-31 Alexander Tyurin , Andrei Spiridonov , Varvara Rudenko

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Many real-world applications can be formulated as multi-agent cooperation problems, such as network packet routing and coordination of autonomous vehicles. The emergence of deep reinforcement learning (DRL) provides a promising approach for…

Multiagent Systems · Computer Science 2022-06-28 Zhixuan Liang , Jiannong Cao , Shan Jiang , Divya Saxena , Huafeng Xu

Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems

This paper studies a class of multi-agent reinforcement learning (MARL) problems where the reward that an agent receives depends on the states of other agents, but the next state only depends on the agent's own current state and action. We…

Multiagent Systems · Computer Science 2023-05-16 Xin Liu , Honghao Wei , Lei Ying

Deep Reinforcement Learning for Distributed and Uncoordinated Cognitive Radios Resource Allocation

This paper presents a novel deep reinforcement learning-based resource allocation technique for the multi-agent environment presented by a cognitive radio network where the interactions of the agents during learning may lead to a…

Machine Learning · Computer Science 2022-05-30 Ankita Tondwalkar , Andres Kwasinski

ORVIT: Near-Optimal Online Distributionally Robust Reinforcement Learning

We investigate reinforcement learning (RL) in the presence of distributional mismatch between training and deployment, where policies trained in simulators often underperform in practice due to mismatches between training and deployment…

Machine Learning · Computer Science 2025-11-12 Debamita Ghosh , George K. Atia , Yue Wang

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Reinforcement Learning (RL) methods have emerged as a popular choice for training an efficient and effective dialogue policy. However, these methods suffer from sparse and unstable reward signals returned by a user simulator only when a…

Artificial Intelligence · Computer Science 2020-09-18 Ziming Li , Sungjin Lee , Baolin Peng , Jinchao Li , Julia Kiseleva , Maarten de Rijke , Shahin Shayandeh , Jianfeng Gao

Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range

This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems. The agents engage within a specified network under local communication…

Multiagent Systems · Computer Science 2024-03-06 Yuzi Yan , Yuan Shen

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

In distributed optimization, the practical problem-solving performance is essentially sensitive to algorithm selection, parameter setting, problem type and data pattern. Thus, it is often laborious to acquire a highly efficient method for a…

Optimization and Control · Mathematics 2024-01-04 Daokuan Zhu , Tianqi Xu , Jie Lu

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper,…

Robotics · Computer Science 2019-10-10 Arunkumar Byravan , Jost Tobias Springenberg , Abbas Abdolmaleki , Roland Hafner , Michael Neunert , Thomas Lampe , Noah Siegel , Nicolas Heess , Martin Riedmiller

Evolved Policy Gradients

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve…

Machine Learning · Computer Science 2018-05-01 Rein Houthooft , Richard Y. Chen , Phillip Isola , Bradly C. Stadie , Filip Wolski , Jonathan Ho , Pieter Abbeel

Multi-task Reinforcement Learning in Reproducing Kernel Hilbert Spaces via Cross-learning

Reinforcement learning (RL) is a framework to optimize a control policy using rewards that are revealed by the system as a response to a control action. In its standard form, RL involves a single agent that uses its policy to accomplish a…

Systems and Control · Electrical Eng. & Systems 2021-11-24 Juan Cervino , Juan Andres Bazerque , Miguel Calvo-Fullana , Alejandro Ribeiro

Align and Filter: Improving Performance in Asynchronous On-Policy RL

Distributed training and increasing the gradient update frequency are practical strategies to accelerate learning and improve performance, but both exacerbate a central challenge: \textit{policy lag}, which is the mismatch between the…

Machine Learning · Computer Science 2026-03-03 Homayoun Honari , Roger Creus Castanyer , Michael Przystupa , Michael Noukhovitch , Pablo Samuel Castro , Glen Berseth

Distributed Reinforcement Learning for Flexible and Efficient UAV Swarm Control

Over the past few years, the use of swarms of Unmanned Aerial Vehicles (UAVs) in monitoring and remote area surveillance applications has become widespread thanks to the price reduction and the increased capabilities of drones. The drones…

Machine Learning · Computer Science 2021-03-09 Federico Venturini , Federico Mason , Francesco Pase , Federico Chiariotti , Alberto Testolin , Andrea Zanella , Michele Zorzi

Distributed Deep Reinforcement Learning: An Overview

Deep reinforcement learning (DRL) is a very active research area. However, several technical and scientific issues require to be addressed, amongst which we can mention data inefficiency, exploration-exploitation trade-off, and multi-task…

Machine Learning · Computer Science 2020-11-24 Mohammad Reza Samsami , Hossein Alimadad

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Learning a goal-oriented dialog policy is generally performed offline with supervised learning algorithms or online with reinforcement learning (RL). Additionally, as companies accumulate massive quantities of dialog transcripts between…

Artificial Intelligence · Computer Science 2017-12-11 Li Zhou , Kevin Small , Oleg Rokhlenko , Charles Elkan

Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average…

Optimization and Control · Mathematics 2024-05-07 Sihan Zeng , Thinh T. Doan , Justin Romberg