Related papers: Communication Efficient Parallel Reinforcement Lea…

Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rather, the agent has several representations…

Machine Learning · Computer Science 2013-03-19 Odalric-Ambrym Maillard , Phuong Nguyen , Ronald Ortner , Daniil Ryabko

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when…

Machine Learning · Computer Science 2022-05-03 Justin Lidard , Udari Madhushani , Naomi Ehrich Leonard

Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation

We study multi-agent reinforcement learning in the setting of episodic Markov decision processes, where multiple agents cooperate via communication through a central server. We propose a provably efficient algorithm based on value iteration…

Machine Learning · Computer Science 2023-06-27 Yifei Min , Jiafan He , Tianhao Wang , Quanquan Gu

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized…

Machine Learning · Computer Science 2020-01-29 Seyed Mohammad Asghari , Yi Ouyang , Ashutosh Nayyar

Transfer in Reinforcement Learning via Regret Bounds for Learning Agents

We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of $\aleph$ agents operating in the same Markov decision process,…

Machine Learning · Computer Science 2025-11-14 Adrienne Tuynman , Ronald Ortner

Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems

A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic…

Machine Learning · Computer Science 2023-10-31 Jialin Yi

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward…

Machine Learning · Computer Science 2022-05-16 Ahmadreza Moradipari , Mohammad Ghavamzadeh , Mahnoosh Alizadeh

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes in which an agent repeatedly interacts with an environment that is modeled by a controlled Markov process. At each time step $t$, it earns a reward, and also incurs a…

Machine Learning · Computer Science 2023-03-16 Rahul Singh , Abhishek Gupta , Ness B. Shroff

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While…

Machine Learning · Computer Science 2026-05-04 Haichen Hu , Jian Qian , David Simchi-Levi

Social Learning in Multi Agent Multi Armed Bandits

In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed…

Machine Learning · Computer Science 2019-11-06 Abishek Sankararaman , Ayalvadi Ganesh , Sanjay Shakkottai

Learning to Collaborate in Markov Decision Processes

We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting. We study the problem of designing a learning algorithm for the first agent (A1) that facilitates a successful collaboration even in cases…

Machine Learning · Computer Science 2019-06-21 Goran Radanovic , Rati Devidze , David C. Parkes , Adish Singla

Distributed Linear Bandits under Communication Constraints

We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink…

Machine Learning · Computer Science 2025-11-17 Sudeep Salgia , Qing Zhao

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for…

Machine Learning · Computer Science 2019-11-19 Yogev Bar-On , Yishay Mansour

Reinforcement Learning algorithms for regret minimization in structured Markov Decision Processes

A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation…

Machine Learning · Computer Science 2016-08-18 K J Prabuchandran , Tejas Bodas , Theja Tulabandhula

Reinforcement learning for quantum processes with memory

In reinforcement learning, an agent interacts sequentially with an environment to maximize a reward, receiving only partial, probabilistic feedback. This creates a fundamental exploration-exploitation trade-off: the agent must explore to…

Quantum Physics · Physics 2026-03-27 Josep Lumbreras , Ruo Cheng Huang , Yanglin Hu , Marco Fanizza , Mile Gu

Accelerating Distributed Online Meta-Learning via Multi-Agent Collaboration under Limited Communication

Online meta-learning is emerging as an enabling technique for achieving edge intelligence in the IoT ecosystem. Nevertheless, to learn a good meta-model for within-task fast adaptation, a single agent alone has to learn over many tasks, and…

Machine Learning · Computer Science 2020-12-22 Sen Lin , Mehmet Dedeoglu , Junshan Zhang

Asymptotically optimal regret in communicating Markov decision processes

In this paper, we present a learning algorithm that achieves asymptotically optimal regret for Markov decision processes in average reward under a communicating assumption. That is, given a communicating Markov decision process $M$, our…

Machine Learning · Computer Science 2025-05-26 Victor Boone

Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits

In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent multi-armed bandits. For regret minimization in multi-armed bandits, we present the first…

Machine Learning · Computer Science 2023-12-22 Nikolai Karpov , Qin Zhang

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through…

Machine Learning · Computer Science 2024-07-04 Ronshee Chawla , Abishek Sankararaman , Ayalvadi Ganesh , Sanjay Shakkottai

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Motivated by real-world settings where data collection and policy deployment -- whether for a single agent or across multiple agents -- are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL…

Machine Learning · Statistics 2026-03-11 Haochen Zhang , Zhong Zheng , Lingzhou Xue