Related papers: Improving Gradient Estimation by Incorporating Sen…

Efficient Gradient Estimation for Motor Control Learning

The task of estimating the gradient of a function in the presence of noise is central to several forms of reinforcement learning, including policy search methods. We present two techniques for reducing gradient estimation errors in the…

Machine Learning · Computer Science 2012-12-12 Gregory Lawrence , Noah Cowan , Stuart Russell

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given class of policies which maximizes the expected return. Given a differentiable model of the policy, we want to apply a gradient-ascent technique to reach a local optimum.…

Machine Learning · Computer Science 2019-11-13 Mattis Manfred Kämmerer

Hindsight policy gradients

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable…

Machine Learning · Computer Science 2019-02-21 Paulo Rauber , Avinash Ummadisingu , Filipe Mutz , Juergen Schmidhuber

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

Model-free Policy Learning with Reward Gradients

Despite the increasing popularity of policy gradient methods, they are yet to be widely utilized in sample-scarce applications, such as robotics. The sample efficiency could be improved by making best usage of available information. As a…

Machine Learning · Computer Science 2023-11-03 Qingfeng Lan , Samuele Tosatto , Homayoon Farrahi , A. Rupam Mahmood

The Reinforce Policy Gradient Algorithm Revisited

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with…

Machine Learning · Computer Science 2023-10-10 Shalabh Bhatnagar

Gradient-Aware Model-based Policy Search

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor…

Machine Learning · Computer Science 2020-10-20 Pierluca D'Oro , Alberto Maria Metelli , Andrea Tirinzoni , Matteo Papini , Marcello Restelli

Stabilizing Policy Gradient Methods via Reward Profiling

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from…

Machine Learning · Computer Science 2026-01-27 Shihab Ahmed , El Houcine Bergou , Aritra Dutta , Yue Wang

Identifying Policy Gradient Subspaces

Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised…

Machine Learning · Computer Science 2024-03-19 Jan Schneider , Pierre Schumacher , Simon Guist , Le Chen , Daniel Häufle , Bernhard Schölkopf , Dieter Büchler

Measuring the performance of sensors that report uncertainty

We provide methods to validate and compare sensor outputs, or inference algorithms applied to sensor data, by adapting statistical scoring rules. The reported output should either be in the form of a prediction interval or of a parameter…

Data Analysis, Statistics and Probability · Physics 2015-07-07 A. D. Martin , T. C. A. Molteno , M. Parry

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The…

Machine Learning · Computer Science 2012-06-26 Gergely Neu , Csaba Szepesvari

Gradient-free Policy Architecture Search and Adaptation

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can…

Machine Learning · Computer Science 2017-10-18 Sayna Ebrahimi , Anna Rohrbach , Trevor Darrell

Improving Deep Policy Gradients with Value Function Search

Deep Policy Gradient (PG) algorithms employ value networks to drive the learning of parameterized policies and reduce the variance of the gradient estimates. However, value function approximation gets stuck in local optima and struggles to…

Machine Learning · Computer Science 2023-02-21 Enrico Marchesini , Christopher Amato

Reinforcement Learning Using Expectation Maximization Based Guided Policy Search for Stochastic Dynamics

Guided policy search algorithms have been proven to work with incredible accuracy for not only controlling a complicated dynamical system, but also learning optimal policies from various unseen instances. One assumes true nature of the…

Systems and Control · Electrical Eng. & Systems 2020-10-02 Prakash Mallick , Zhiyong Chen , Mohsen Zamani

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes…

Machine Learning · Computer Science 2020-04-13 Sujay Bhatt , Alec Koppel , Vikram Krishnamurthy

Environmental Information Improves Robotic Search Performance

We address the problem where a mobile search agent seeks to find an unknown number of stationary objects distributed in a bounded search domain, and the search mission is subject to time/distance constraint. Our work accounts for false…

Robotics · Computer Science 2018-06-26 Harun Yetkin , Collin Lutz , Daniel Stilwell

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

This paper investigates the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods in reinforcement learning. Our approach is to estimate the value function from prior…

Machine Learning · Computer Science 2023-02-06 Md Masudur Rahman , Yexiang Xue

Reinforcement Learning by Value Gradients

The concept of the value-gradient is introduced and developed in the context of reinforcement learning. It is shown that by learning the value-gradients exploration or stochastic behaviour is no longer needed to find locally optimal…

Neural and Evolutionary Computing · Computer Science 2008-03-26 Michael Fairbank

Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learning with the following steps. First, learning is slowed down (lazy learning) so that the episodic policy change can be computed with the…

Machine Learning · Computer Science 2022-01-24 Balázs Varga , Balázs Kulcsár , Morteza Haghir Chehreghani

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

There exist a number of reinforcement learning algorithms which learnby climbing the gradient of expected reward. Their long-runconvergence has been proved, even in partially observableenvironments with non-deterministic actions, and…

Machine Learning · Computer Science 2013-01-14 Lex Weaver , Nigel Tao