Related papers: Bregman Gradient Policy Optimization

Policy Optimization with Stochastic Mirror Descent

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel…

Machine Learning · Computer Science 2022-02-10 Long Yang , Yu Zhang , Gang Zheng , Qian Zheng , Pengfei Li , Jianhang Huang , Jun Wen , Gang Pan

Beyond KL Divergence: Policy Optimization with Flexible Bregman Divergences for LLM Reasoning

Policy optimization methods like Group Relative Policy Optimization (GRPO) and its variants have achieved strong results on mathematical reasoning and code generation tasks. Despite extensive exploration of reward processing strategies and…

Machine Learning · Computer Science 2026-02-05 Rui Yuan , Mykola Khandoga , Vinay Kumar Sankarapu

Divergence-Augmented Policy Optimization

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature…

Machine Learning · Computer Science 2025-01-28 Qing Wang , Yingru Li , Jiechao Xiong , Tong Zhang

Block Policy Mirror Descent

In this paper, we present a new policy gradient (PG) methods, namely the block policy mirror descent (BPMD) method for solving a class of regularized reinforcement learning (RL) problems with (strongly)-convex regularizers. Compared to the…

Machine Learning · Computer Science 2022-09-20 Guanghui Lan , Yan Li , Tuo Zhao

Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes

Safety is an essential requirement for reinforcement learning systems. The newly emerging framework of robust constrained Markov decision processes allows learning policies that satisfy long-term constraints while providing guarantees under…

Machine Learning · Computer Science 2025-12-19 David M. Bossens , Atsushi Nitanda

Reduced Policy Optimization for Continuous Control with Hard Constraints

Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints…

Machine Learning · Computer Science 2023-12-22 Shutong Ding , Jingya Wang , Yali Du , Ye Shi

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which finds the desired policy by maximizing value functions via optimization techniques, lies at the heart of reinforcement learning (RL). In addition to value maximization, other practical considerations arise as…

Machine Learning · Computer Science 2023-01-12 Wenhao Zhan , Shicong Cen , Baihe Huang , Yuxin Chen , Jason D. Lee , Yuejie Chi

Sparse Q-learning with Mirror Descent

This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to…

Machine Learning · Computer Science 2012-10-19 Sridhar Mahadevan , Bo Liu

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate…

Machine Learning · Computer Science 2019-05-30 Pan Xu , Felicia Gao , Quanquan Gu

GBM-based Bregman Proximal Algorithms for Constrained Learning

As the complexity of learning tasks surges, modern machine learning encounters a new constrained learning paradigm characterized by more intricate and data-driven function constraints. Prominent applications include Neyman-Pearson…

Machine Learning · Computer Science 2023-08-22 Zhenwei Lin , Qi Deng

On the Linear convergence of Natural Policy Gradient Algorithm

Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among…

Machine Learning · Computer Science 2021-05-05 Sajad Khodadadian , Prakirt Raj Jhunjhunwala , Sushil Mahavir Varma , Siva Theja Maguluri

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation…

Machine Learning · Computer Science 2025-10-15 Nianyi Lin , Jiajie Zhang , Lei Hou , Juanzi Li

Learning to Constrain Policy Optimization with Virtual Trust Region

We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust…

Machine Learning · Computer Science 2022-09-19 Hung Le , Thommen Karimpanal George , Majid Abdolshah , Dung Nguyen , Kien Do , Sunil Gupta , Svetha Venkatesh

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

BSO: Safety Alignment Is Density Ratio Matching

Aligning language models for both helpfulness and safety typically requires complex pipelines-separate reward and cost models, online reinforcement learning, and primal-dual updates. Recent direct preference optimization approaches simplify…

Machine Learning · Computer Science 2026-05-13 Tien-Phat Nguyen , Truong Nguyen , Thin Nguyen , Duy Minh Ho Nguyen , Ngoc-Thanh Dinh , Trung Le

Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

We propose a novel Bregman descent algorithm for minimizing a convex function that is expressed as the sum of a differentiable part (defined over an open set) and a possibly nonsmooth term. The approach, referred to as the Variable Bregman…

Machine Learning · Computer Science 2025-02-06 Ségolène Martin , Jean-Christophe Pesquet , Gabriele Steidl , Ismail Ben Ayed

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or…

Machine Learning · Computer Science 2024-01-09 Wentse Chen , Shiyu Huang , Yuan Chiang , Tim Pearce , Wei-Wei Tu , Ting Chen , Jun Zhu

Distributed Mirror Descent Algorithm with Bregman Damping for Nonsmooth Constrained Optimization

To solve distributed optimization efficiently with various constraints and nonsmooth functions, we propose a distributed mirror descent algorithm with embedded Bregman damping, as a generalization of conventional distributed…

Optimization and Control · Mathematics 2021-08-30 Guanpu Chen , Weijian Li , Gehui Xu , Yiguang Hong

Value Mirror Descent for Reinforcement Learning

Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy…

Optimization and Control · Mathematics 2026-04-08 Zhichao Jia , Guanghui Lan

Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods…

Machine Learning · Computer Science 2018-06-15 Matteo Papini , Damiano Binaghi , Giuseppe Canonaco , Matteo Pirotta , Marcello Restelli