Related papers: Continuous-Time Distributed Dynamic Programming fo…
The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of…
We present a principled and efficient planning algorithm for collaborative multiagent dynamical systems. All computation, during both the planning and the execution phases, is distributed among the agents; each agent only needs to model and…
Multi-agent planning under stochastic dynamics is usually formalised using decentralized (partially observable) Markov decision processes ( MDPs) and reachability or expected reward specifications. In this paper, we propose a different…
In this paper, we propose two novel decentralized optimization frameworks for multi-agent nonlinear optimal control problems in robotics. The aim of this work is to suggest architectures that inherit the computational efficiency and…
In this work, we study dynamic programming (DP) algorithms for partially observable Markov decision processes with jointly continuous and discrete state-spaces. We consider a class of stochastic systems which have coupled discrete and…
Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios…
Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and…
Network Markov Decision Processes (MDPs), a popular model for multi-agent control, pose a significant challenge to efficient learning due to the exponential growth of the global state-action space with the number of agents. In this work,…
Distributed decision making in multi-agent networks has recently attracted significant research attention thanks to its wide applicability, e.g. in the management and optimization of computer networks, power systems, robotic teams, sensor…
Min-max problems are important in multi-agent sequential decision-making because they improve the performance of the worst-performing agent in the network. However, solving the multi-agent min-max problem is challenging. We propose a…
This paper proposes a multi-scale method to design a continuous-time distributed algorithm for constrained convex optimization problems by using multi-agents with Markov switched network dynamics and noisy inter-agent communications. Unlike…
In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy…
Consider a multi-agent system in a dynamic and uncertain environment. Each agent's local decision problem is modeled as a Markov decision process (MDP) and agents must coordinate on a joint action in each period, which provides a reward to…
In important applications involving multi-task networks with multiple objectives, agents in the network need to decide between these multiple objectives and reach an agreement about which single objective to follow for the network. In this…
Missions for autonomous systems often require agents to visit multiple targets in complex operating conditions. This work considers the problem of visiting a set of targets in minimum time by a team of non-communicating agents in a Markov…
This paper introduces a new approach that leverages Multi-agent Bayesian Optimization (MABO) to design Distributed Model Predictive Control (DMPC) schemes for multi-agent systems. The primary objective is to learn optimal DMPC schemes even…
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes…
We study online learning in episodic constrained Markov decision processes (CMDPs), where the learner aims at collecting as much reward as possible over the episodes, while satisfying some long-term constraints during the learning process.…
We develop a discrete-time version of the blended dynamics theorem for the use of designing distributed computation algorithms. The blended dynamics theorem enables to predict the behavior of heterogeneous multi-agent systems. Therefore,…
The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…