Related papers: Policy Optimization as Wasserstein Gradient Flows

Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization

We present a geometric framework for Reinforcement Learning (RL) that views policies as maps into the Wasserstein space of action probabilities. First, we define a Riemannian structure induced by stationary distributions, proving its…

Machine Learning · Computer Science 2026-04-17 Mathias Dus

A note on convergence of Wasserstein policy optimization

Wasserstein Policy Optimization (WPO) is a recently proposed reinforcement learning algorithm that leverages Wasserstein gradient flows to optimize stochastic policies in continuous action spaces. Despite its empirical success, the…

Machine Learning · Computer Science 2026-05-22 David Šiška , Yufei Zhang

Wasserstein Policy Optimization

We introduce Wasserstein Policy Optimization (WPO), an actor-critic algorithm for reinforcement learning in continuous action spaces. WPO can be derived as an approximation to Wasserstein gradient flow over the space of all policies…

Machine Learning · Computer Science 2025-05-02 David Pfau , Ian Davies , Diana Borsa , Joao G. M. Araujo , Brendan Tracey , Hado van Hasselt

Efficient Wasserstein Natural Gradients for Reinforcement Learning

A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that…

Machine Learning · Computer Science 2021-03-19 Ted Moskovitz , Michael Arbel , Ferenc Huszar , Arthur Gretton

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs…

Machine Learning · Computer Science 2024-01-18 Yi Shen , Pan Xu , Michael M. Zavlanos

Wasserstein Gradient Flows for Optimizing Gaussian Mixture Policies

Robots often rely on a repertoire of previously-learned motion policies for performing tasks of diverse complexities. When facing unseen task conditions or when new task requirements arise, robots must adapt their motion policies…

Machine Learning · Computer Science 2023-05-18 Hanna Ziesche , Leonel Rozo

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the…

Machine Learning · Computer Science 2023-04-20 Weiqin Chen , Dharmashankar Subramanian , Santiago Paternain

Acceleration in Policy Optimization

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. Leveraging the connection between…

Machine Learning · Computer Science 2023-09-07 Veronica Chelu , Tom Zahavy , Arthur Guez , Doina Precup , Sebastian Flennerhag

Variational Analysis in the Wasserstein Space

We study optimization problems whereby the optimization variable is a probability measure. Since the probability space is not a vector space, many classical and powerful methods for optimization (e.g., gradients) are of little help. Thus,…

Optimization and Control · Mathematics 2024-06-18 Nicolas Lanzetti , Antonio Terpin , Florian Dörfler

Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely…

Machine Learning · Statistics 2024-11-05 Daniel Kuhn , Peyman Mohajerin Esfahani , Viet Anh Nguyen , Soroosh Shafieezadeh-Abadeh

Flow-Based Policy for Online Reinforcement Learning

We present \textbf{FlowRL}, a novel framework for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. We argue that in addition to training signals, enhancing the…

Machine Learning · Computer Science 2025-06-17 Lei Lv , Yunfei Li , Yu Luo , Fuchun Sun , Tao Kong , Jiafeng Xu , Xiao Ma

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality…

Machine Learning · Computer Science 2022-06-20 Zuxin Liu , Zhepeng Cen , Vladislav Isenbaev , Wei Liu , Zhiwei Steven Wu , Bo Li , Ding Zhao

Modeling of Political Systems using Wasserstein Gradient Flows

The study of complex political phenomena such as parties' polarization calls for mathematical models of political systems. In this paper, we aim at modeling the time evolution of a political system whereby various parties selfishly interact…

Systems and Control · Electrical Eng. & Systems 2022-09-13 Nicolas Lanzetti , Joudi Hajar , Florian Dörfler

Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps

Offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset, making it particularly valuable in scenarios where data collection is costly, such as robotics. A major challenge in offline RL is distributional…

Machine Learning · Computer Science 2025-07-16 Motoki Omura , Yusuke Mukuta , Kazuki Ota , Takayuki Osa , Tatsuya Harada

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Recent advancements in reinforcement learning (RL) have achieved great success in fine-tuning diffusion-based generative models. However, fine-tuning continuous flow-based generative models to align with arbitrary user-defined reward…

Machine Learning · Computer Science 2025-02-11 Jiajun Fan , Shuaike Shen , Chaoran Cheng , Yuxin Chen , Chumeng Liang , Ge Liu

Robust Risk-Aware Reinforcement Learning

We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria. To allow agents to express a wide variety of risk-reward profiles, we assess the value of a policy using rank dependent expected…

Machine Learning · Computer Science 2021-12-16 Sebastian Jaimungal , Silvana Pesenti , Ye Sheng Wang , Hariom Tatsat

Local policy search with Bayesian optimization

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of…

Machine Learning · Computer Science 2021-11-23 Sarah Müller , Alexander von Rohr , Sebastian Trimpe

Wasserstein Proximal Policy Gradient

We study policy gradient methods for continuous-action, entropy-regularized reinforcement learning through the lens of Wasserstein geometry. Starting from a Wasserstein proximal update, we derive Wasserstein Proximal Policy Gradient (WPPG)…

Machine Learning · Computer Science 2026-03-04 Zhaoyu Zhu , Shuhan Zhang , Rui Gao , Shuang Li

A policy gradient approach for optimization of smooth risk measures

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Learning from Scarce Experience

Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the…

Artificial Intelligence · Computer Science 2007-05-23 Leonid Peshkin , Christian R. Shelton