English
Related papers

Related papers: Model-Based Reparameterization Policy Gradient Met…

200 papers

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used…

Machine Learning · Computer Science 2023-07-21 Zhiao Huang , Litian Liang , Zhan Ling , Xuanlin Li , Chuang Gan , Hao Su

By leveraging differentiable dynamics, Reparameterization Policy Gradient (RPG) achieves high sample efficiency. However, current approaches are hindered by two critical limitations: the under-utilization of computationally expensive…

Machine Learning · Computer Science 2026-02-09 Hai Zhong , Xun Wang , Zhuoran Li , Longbo Huang

Reparameterization Policy Gradient (RPG) has emerged as a powerful paradigm for model-based reinforcement learning, enabling high sample efficiency by backpropagating gradients through differentiable dynamics. However, prior RPG approaches…

Machine Learning · Computer Science 2026-02-04 Hai Zhong , Zhuoran Li , Xun Wang , Longbo Huang

Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL) that aims to optimize multiple, often conflicting objectives simultaneously rather than focusing on a single reward. This approach…

Machine Learning · Computer Science 2025-08-15 Davide Guidobene , Lorenzo Benedetti , Diego Arapovic

Policy gradient (PG) methods are the backbone of many reinforcement learning algorithms due to their good performance in policy optimization problems. As a gradient-based approach, PG methods typically rely on knowledge of the system…

Systems and Control · Electrical Eng. & Systems 2026-04-02 Bowen Song , Andrea Iannelli

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.…

Machine Learning · Computer Science 2019-10-16 Paavo Parmas , Masashi Sugiyama

We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular,…

Programming Languages · Computer Science 2023-01-10 Basim Khajwal , C. -H. Luke Ong , Dominik Wagner

Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks,…

Machine Learning · Computer Science 2021-06-01 Paavo Parmas , Masashi Sugiyama

We develop subgradient- and gradient-based methods for minimizing strongly convex functions under a notion which generalizes the standard Euclidean strong convexity. We propose a unifying framework for subgradient methods which yields two…

Optimization and Control · Mathematics 2016-08-19 Masaru Ito

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information…

Machine Learning · Computer Science 2022-11-17 Yanli Liu , Kaiqing Zhang , Tamer Başar , Wotao Yin

Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the…

Machine Learning · Computer Science 2022-04-05 Jakub Grudzien Kuba , Muning Wen , Yaodong Yang , Linghui Meng , Shangding Gu , Haifeng Zhang , David Henry Mguni , Jun Wang

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based…

Machine Learning · Computer Science 2019-02-06 Paavo Parmas , Carl Edward Rasmussen , Jan Peters , Kenji Doya

Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Junjiao Tian , Xiaoliang Dai , Chih-Yao Ma , Zecheng He , Yen-Cheng Liu , Zsolt Kira

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization…

Machine Learning · Statistics 2020-02-13 Christian A. Naesseth , Francisco J. R. Ruiz , Scott W. Linderman , David M. Blei

This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. Particularly, we focus on the enhancement of training and evaluation performance in…

Machine Learning · Computer Science 2020-05-26 Mohammed Sharafath Abdul Hameed , Gavneet Singh Chadha , Andreas Schwung , Steven X. Ding

Reinforcement Learning (RL) can directly enhance the reasoning capabilities of large language models without extensive reliance on Supervised Fine-Tuning (SFT). In this work, we revisit the traditional Policy Gradient (PG) mechanism and…

Machine Learning · Computer Science 2026-02-04 Xiangxiang Chu , Hailang Huang , Xiao Zhang , Fei Wei , Yong Wang

In high-dimensional and/or non-parametric regression problems, regularization (or penalization) is used to control model complexity and induce desired structure. Each penalty has a weight parameter that indicates how strongly the structure…

Machine Learning · Statistics 2017-03-30 Jean Feng , Noah Simon

Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs). KL regularization is ubiquitous, yet the design surface, choice of KL direction (forward vs. reverse),…

Machine Learning · Computer Science 2026-02-20 Yifan Zhang , Yifeng Liu , Huizhuo Yuan , Yang Yuan , Quanquan Gu , Andrew Chi-Chih Yao

Reinforcement Learning (RL) has made significant strides in complex tasks but struggles in multi-task settings with different embodiments. World model methods offer scalability by learning a simulation of the environment but often rely on…

Machine Learning · Computer Science 2025-02-25 Ignat Georgiev , Varun Giridhar , Nicklas Hansen , Animesh Garg
‹ Prev 1 2 3 10 Next ›