Related papers: Model-Based Reparameterization Policy Gradient Met…

Reparameterized Policy Learning for Multimodal Trajectory Optimization

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used…

Machine Learning · Computer Science 2023-07-21 Zhiao Huang , Litian Liang , Zhan Ling , Xuanlin Li , Chuang Gan , Hao Su

Reparameterization Proximal Policy Optimization

By leveraging differentiable dynamics, Reparameterization Policy Gradient (RPG) achieves high sample efficiency. However, current approaches are hindered by two critical limitations: the under-utilization of computationally expensive…

Machine Learning · Computer Science 2026-02-09 Hai Zhong , Xun Wang , Zhuoran Li , Longbo Huang

Reparameterization Flow Policy Optimization

Reparameterization Policy Gradient (RPG) has emerged as a powerful paradigm for model-based reinforcement learning, enabling high sample efficiency by backpropagating gradients through differentiable dynamics. However, prior RPG approaches…

Machine Learning · Computer Science 2026-02-04 Hai Zhong , Zhuoran Li , Xun Wang , Longbo Huang

Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning

Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL) that aims to optimize multiple, often conflicting objectives simultaneously rather than focusing on a single reward. This approach…

Machine Learning · Computer Science 2025-08-15 Davide Guidobene , Lorenzo Benedetti , Diego Arapovic

Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data

Policy gradient (PG) methods are the backbone of many reinforcement learning algorithms due to their good performance in policy optimization problems. As a gradient-based approach, PG methods typically rely on knowledge of the system…

Systems and Control · Electrical Eng. & Systems 2026-04-02 Bowen Song , Andrea Iannelli

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.…

Machine Learning · Computer Science 2019-10-16 Paavo Parmas , Masashi Sugiyama

Fast and Correct Gradient-Based Optimisation for Probabilistic Programming via Smoothing

We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular,…

Programming Languages · Computer Science 2023-01-10 Basim Khajwal , C. -H. Luke Ong , Dominik Wagner

A unified view of likelihood ratio and reparameterization gradients

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks,…

Machine Learning · Computer Science 2021-06-01 Paavo Parmas , Masashi Sugiyama

New results on subgradient methods for strongly convex optimization problems with a unified analysis

We develop subgradient- and gradient-based methods for minimizing strongly convex functions under a notion which generalizes the standard Euclidean strong convexity. We propose a unifying framework for subgradient methods which yields two…

Optimization and Control · Mathematics 2016-08-19 Masaru Ito

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information…

Machine Learning · Computer Science 2022-11-17 Yanli Liu , Kaiqing Zhang , Tamer Başar , Wotao Yin

Settling the Variance of Multi-Agent Policy Gradients

Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the…

Machine Learning · Computer Science 2022-04-05 Jakub Grudzien Kuba , Muning Wen , Yaodong Yang , Linghui Meng , Shangding Gu , Haifeng Zhang , David Henry Mguni , Jun Wang

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based…

Machine Learning · Computer Science 2019-02-06 Paavo Parmas , Carl Edward Rasmussen , Jan Peters , Kenji Doya

Trainable Projected Gradient Method for Robust Fine-tuning

Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Junjiao Tian , Xiaoliang Dai , Chih-Yao Ma , Zecheng He , Yen-Cheng Liu , Zsolt Kira

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms

Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization…

Machine Learning · Statistics 2020-02-13 Christian A. Naesseth , Francisco J. R. Ruiz , Scott W. Linderman , David M. Blei

Gradient Monitored Reinforcement Learning

This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. Particularly, we focus on the enhancement of training and evaluation performance in…

Machine Learning · Computer Science 2020-05-26 Mohammed Sharafath Abdul Hameed , Gavneet Singh Chadha , Andreas Schwung , Steven X. Ding

GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

Reinforcement Learning (RL) can directly enhance the reasoning capabilities of large language models without extensive reliance on Supervised Fine-Tuning (SFT). In this work, we revisit the traditional Policy Gradient (PG) mechanism and…

Machine Learning · Computer Science 2026-02-04 Xiangxiang Chu , Hailang Huang , Xiao Zhang , Fei Wei , Yong Wang

Gradient-based Regularization Parameter Selection for Problems with Non-smooth Penalty Functions

In high-dimensional and/or non-parametric regression problems, regularization (or penalization) is used to control model complexity and induce desired structure. Each penalty has a weight parameter that indicates how strongly the structure…

Machine Learning · Statistics 2017-03-30 Jean Feng , Noah Simon

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs). KL regularization is ubiquitous, yet the design surface, choice of KL direction (forward vs. reverse),…

Machine Learning · Computer Science 2026-02-20 Yifan Zhang , Yifeng Liu , Huizhuo Yuan , Yang Yuan , Quanquan Gu , Andrew Chi-Chih Yao

PWM: Policy Learning with Multi-Task World Models

Reinforcement Learning (RL) has made significant strides in complex tasks but struggles in multi-task settings with different embodiments. World model methods offer scalability by learning a simulation of the environment but often rely on…

Machine Learning · Computer Science 2025-02-25 Ignat Georgiev , Varun Giridhar , Nicklas Hansen , Animesh Garg