Related papers: Acceleration in Policy Optimization

Accelerating Optimization via Adaptive Prediction

We present a powerful general framework for designing data-dependent optimization algorithms, building upon and unifying recent techniques in adaptive regularization, optimistic gradient predictions, and problem-dependent randomization. We…

Machine Learning · Statistics 2015-10-14 Mehryar Mohri , Scott Yang

Policy Optimization as Wasserstein Gradient Flows

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving…

Machine Learning · Computer Science 2018-08-10 Ruiyi Zhang , Changyou Chen , Chunyuan Li , Lawrence Carin

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper,…

Robotics · Computer Science 2019-10-10 Arunkumar Byravan , Jost Tobias Springenberg , Abbas Abdolmaleki , Roland Hafner , Michael Neunert , Thomas Lampe , Noah Siegel , Nicolas Heess , Martin Riedmiller

A Parametric Class of Approximate Gradient Updates for Policy Optimization

Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e.g. value versus policy representation) or how the learning objective is formulated, yet they share a common…

Machine Learning · Computer Science 2022-06-20 Ramki Gummadi , Saurabh Kumar , Junfeng Wen , Dale Schuurmans

Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation

We study the problem of efficiently estimating policies that simultaneously optimize multiple objectives in reinforcement learning (RL). Given $n$ objectives (or tasks), we seek the optimal partition of these objectives into $k \ll n$…

Machine Learning · Computer Science 2026-02-24 Zhenshuo Zhang , Minxuan Duan , Youran Ye , Hongyang R. Zhang

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either…

Machine Learning · Computer Science 2023-12-05 Qinghua Liu , Gellért Weisz , András György , Chi Jin , Csaba Szepesvári

Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference

Robotic systems must be able to quickly and robustly make decisions when operating in uncertain and dynamic environments. While Reinforcement Learning (RL) can be used to compute optimal policies with little prior knowledge about the…

Robotics · Computer Science 2016-09-13 Yunpeng Pan , Xinyan Yan , Evangelos Theodorou , Byron Boots

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the…

Machine Learning · Computer Science 2023-04-20 Weiqin Chen , Dharmashankar Subramanian , Santiago Paternain

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification…

Machine Learning · Computer Science 2023-10-24 Adrien Bolland , Gilles Louppe , Damien Ernst

Learning to Optimize

Algorithm design is a laborious process and often requires many iterations of ideation and validation. In this paper, we explore automating algorithm design and present a method to learn an optimization algorithm, which we believe to be the…

Machine Learning · Computer Science 2016-06-07 Ke Li , Jitendra Malik

Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning

Risk sensitivity has become a central theme in reinforcement learning (RL), where convex risk measures and robust formulations provide principled ways to model preferences beyond expected return. Recent extensions to multi-agent RL (MARL)…

Machine Learning · Computer Science 2025-11-12 Runyu Zhang , Na Li , Asuman Ozdaglar , Jeff Shamma , Gioele Zardini

Path Planning using Reinforcement Learning: A Policy Iteration Approach

With the impact of real-time processing being realized in the recent past, the need for efficient implementations of reinforcement learning algorithms has been on the rise. Albeit the numerous advantages of Bellman equations utilized in RL…

Machine Learning · Computer Science 2023-03-15 Saumil Shivdikar , Jagannath Nirmal

A policy gradient approach for optimization of smooth risk measures

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Meta-reinforcement learning (Meta-RL) has attracted attention due to its capability to enhance reinforcement learning (RL) algorithms, in terms of data efficiency and generalizability. In this paper, we develop a bilevel optimization…

Machine Learning · Computer Science 2024-10-15 Siyuan Xu , Minghui Zhu

Evolved Policy Gradients

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve…

Machine Learning · Computer Science 2018-05-01 Rein Houthooft , Richard Y. Chen , Phillip Isola , Bradly C. Stadie , Filip Wolski , Jonathan Ho , Pieter Abbeel

Beyond the One Step Greedy Approach in Reinforcement Learning

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, $n$-step and trace-based returns, have been…

Artificial Intelligence · Computer Science 2018-08-01 Yonathan Efroni , Gal Dalal , Bruno Scherrer , Shie Mannor

Optimistic Proximal Policy Optimization

Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where…

Machine Learning · Computer Science 2019-06-27 Takahisa Imagawa , Takuya Hiraoka , Yoshimasa Tsuruoka

Accelerated Reinforcement Learning

Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov…

Machine Learning · Computer Science 2018-04-26 K. Lakshmanan

Optimistic Optimisation of Composite Objective with Exponentiated Update

This paper proposes a new family of algorithms for the online optimisation of composite objectives. The algorithms can be interpreted as the combination of the exponentiated gradient and $p$-norm algorithm. Combined with algorithmic ideas…

Optimization and Control · Mathematics 2022-08-09 Weijia Shao , Fikret Sivrikaya , Sahin Albayrak

Iterative Amortized Policy Optimization

Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control, enabling the estimation and sampling of high-value actions. From the variational inference perspective on RL, policy networks, when…

Machine Learning · Computer Science 2021-10-26 Joseph Marino , Alexandre Piché , Alessandro Davide Ialongo , Yisong Yue