Related papers: Particle Value Functions

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the…

Machine Learning · Computer Science 2023-04-20 Weiqin Chen , Dharmashankar Subramanian , Santiago Paternain

Risk-Sensitive Reinforcement Learning via Policy Gradient Search

The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In…

Machine Learning · Computer Science 2022-05-25 Prashanth L. A. , Michael Fu

Policy Gradient from Demonstration and Curiosity

With reinforcement learning, an agent could learn complex behaviors from high-level abstractions of the task. However, exploration and reward shaping remained challenging for existing methods, especially in scenarios where the extrinsic…

Machine Learning · Computer Science 2020-06-11 Jie Chen , Wenjun Xu

Reinforcement Learning by Value Gradients

The concept of the value-gradient is introduced and developed in the context of reinforcement learning. It is shown that by learning the value-gradients exploration or stochastic behaviour is no longer needed to find locally optimal…

Neural and Evolutionary Computing · Computer Science 2008-03-26 Michael Fairbank

The Reinforce Policy Gradient Algorithm Revisited

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with…

Machine Learning · Computer Science 2023-10-10 Shalabh Bhatnagar

Inverse Risk-Sensitive Reinforcement Learning

We address the problem of inverse reinforcement learning in Markov decision processes where the agent is risk-sensitive. In particular, we model risk-sensitivity in a reinforcement learning framework by making use of models of human…

Machine Learning · Computer Science 2017-11-23 Lillian J. Ratliff , Eric Mazumdar

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes…

Machine Learning · Computer Science 2020-04-13 Sujay Bhatt , Alec Koppel , Vikram Krishnamurthy

On The Fragility of Learned Reward Functions

Reward functions are notoriously difficult to specify, especially for tasks with complex goals. Reward learning approaches attempt to infer reward functions from human feedback and preferences. Prior works on reward learning have mainly…

Machine Learning · Computer Science 2023-01-11 Lev McKinney , Yawen Duan , David Krueger , Adam Gleave

Policy Gradient in Partially Observable Environments: Approximation and Convergence

Policy gradient is a generic and flexible reinforcement learning approach that generally enjoys simplicity in analysis, implementation, and deployment. In the last few decades, this approach has been extensively advanced for fully…

Machine Learning · Computer Science 2020-05-26 Kamyar Azizzadenesheli , Yisong Yue , Animashree Anandkumar

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…

Machine Learning · Computer Science 2020-10-19 Santiago Paternain , Juan Andres Bazerque , Alejandro Ribeiro

A Distributional Perspective on Reinforcement Learning

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which…

Machine Learning · Computer Science 2017-07-24 Marc G. Bellemare , Will Dabney , Rémi Munos

Efficiently Learning from Revealed Preference

In this paper, we consider the revealed preferences problem from a learning perspective. Every day, a price vector and a budget is drawn from an unknown distribution, and a rational agent buys his most preferred bundle according to some…

Computer Science and Game Theory · Computer Science 2012-11-20 Morteza Zadimoghaddam , Aaron Roth

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong…

Machine Learning · Computer Science 2020-10-23 Jorge A. Mendez , Boyu Wang , Eric Eaton

Extracting Reward Functions from Diffusion Models

Diffusion models have achieved remarkable results in image generation, and have similarly been used to learn high-performing policies in sequential decision-making tasks. Decision-making diffusion models can be trained on lower-quality…

Machine Learning · Computer Science 2023-12-12 Felipe Nuti , Tim Franzmeyer , João F. Henriques

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given class of policies which maximizes the expected return. Given a differentiable model of the policy, we want to apply a gradient-ascent technique to reach a local optimum.…

Machine Learning · Computer Science 2019-11-13 Mattis Manfred Kämmerer

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the…

Machine Learning · Computer Science 2020-08-27 Alan Chan , Kris de Asis , Richard S. Sutton

Risk-Sensitive Policy with Distributional Reinforcement Learning

Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the…

Machine Learning · Computer Science 2023-01-02 Thibaut Théate , Damien Ernst

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to optimize for a subset of future rewards:…

Machine Learning · Computer Science 2026-03-09 Puneet Mathur , Branislav Kveton , Subhojyoti Mukherjee , Viet Dac Lai

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy…

Machine Learning · Computer Science 2013-01-18 Tingting Zhao , Hirotaka Hachiya , Voot Tangkaratt , Jun Morimoto , Masashi Sugiyama

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning a value function by a smooth general function approximator, to solve a deterministic episodic control problem in a large continuous state space. It is shown that…

Machine Learning · Computer Science 2011-01-04 Michael Fairbank , Eduardo Alonso