Related papers: Dynamic mean field programming
Multi-agent reinforcement learning methods have shown remarkable potential in solving complex multi-agent problems but mostly lack theoretical guarantees. Recently, mean field control and mean field games have been established as a…
Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety…
We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots…
Mean-field reinforcement learning (MF-RL) scales multi-agent RL to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action. However, this reduction requires every agent to act at…
We consider the problem of representing collective behavior of large populations and predicting the evolution of a population distribution over a discrete state space. A discrete time mean field game (MFG) is motivated as an interpretable…
We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and…
In the present work, we study deterministic mean field games (MFGs) with finite time horizon in which the dynamics of a generic agent is controlled by the acceleration. They are described by a system of PDEs coupling a continuity equation…
We formulate a class of mean field games on a finite state space with variational principles resembling those in continuous-state mean field games. We construct a controlled continuity equation featuring a nonlinear activation function on…
We study a specific class of finite-horizon mean field optimal stopping problems by means of the dynamic programming approach. In particular, we consider problems where the state process is not affected by the stopping time. Such problems…
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below…
The theory of mean field games aims at studying deterministic or stochastic differential games (Nash equilibria) as the number of agents tends to infinity. Since very few mean field games have explicit or semi-explicit solutions, numerical…
We study infinite horizon discounted Mean Field Control (MFC) problems with common noise through the lens of Mean Field Markov Decision Processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual…
We study the optimal stopping problem of McKean-Vlasov diffusions when the criterion is a function of the law of the stopped process. A remarkable new feature in this setting is that the stopping time also impacts the dynamics of the…
A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various…
We study policy gradient for mean-field control in continuous time in a reinforcement learning setting. By considering randomised policies with entropy regularisation, we derive a gradient expectation representation of the value function,…
Mean field games (MFGs) have emerged as a powerful framework for modeling interactions in large-scale multi-agent systems. Despite recent advancements in reinforcement learning (RL) for MFGs, existing methods are typically limited to finite…
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential…
We describe a nonlinear generalization of dual dynamic programming theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon…
We consider deterministic mean field games where the dynamics of a typical agent is non-linear with respect to the state variable and affine with respect to the control variable. Particular instances of the problem considered here are mean…
Traditional reinforcement learning (RL) assumes the agents make decisions based on Markov decision processes (MDPs) with one-step transition models. In many real-world applications, such as energy management and stock investment, agents can…