Related papers: Reinforcement Learning by Value Gradients

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning a value function by a smooth general function approximator, to solve a deterministic episodic control problem in a large continuous state space. It is shown that…

Machine Learning · Computer Science 2011-01-04 Michael Fairbank , Eduardo Alonso

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

There exist a number of reinforcement learning algorithms which learnby climbing the gradient of expected reward. Their long-runconvergence has been proved, even in partially observableenvironments with non-deterministic actions, and…

Machine Learning · Computer Science 2013-01-14 Lex Weaver , Nigel Tao

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and…

Optimization and Control · Mathematics 2023-10-10 Shicong Cen , Yuejie Chi

Inverse Reinforcement Learning from a Gradient-based Learner

Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but we also observe part of her…

Machine Learning · Computer Science 2021-09-03 Giorgia Ramponi , Gianluca Drappo , Marcello Restelli

Reinforcement Learning for Adaptive Routing

Reinforcement learning means learning a policy--a mapping of observations into actions--based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with…

Machine Learning · Computer Science 2017-05-25 Leonid Peshkin , Virginia Savova

Meta-Gradient Reinforcement Learning

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of…

Machine Learning · Computer Science 2018-05-25 Zhongwen Xu , Hado van Hasselt , David Silver

Outcome-Driven Reinforcement Learning via Variational Inference

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the…

Machine Learning · Computer Science 2022-12-29 Tim G. J. Rudner , Vitchyr H. Pong , Rowan McAllister , Yarin Gal , Sergey Levine

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

Derivative-Free Reinforcement Learning: A Review

Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which…

Machine Learning · Computer Science 2021-02-12 Hong Qian , Yang Yu

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The…

Machine Learning · Computer Science 2012-06-26 Gergely Neu , Csaba Szepesvari

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to optimize for a subset of future rewards:…

Machine Learning · Computer Science 2026-03-09 Puneet Mathur , Branislav Kveton , Subhojyoti Mukherjee , Viet Dac Lai

A Distributional Perspective on Reinforcement Learning

In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which…

Machine Learning · Computer Science 2017-07-24 Marc G. Bellemare , Will Dabney , Rémi Munos

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using…

Artificial Intelligence · Computer Science 2015-12-15 Lucas Lehnert , Doina Precup

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the…

Machine Learning · Computer Science 2023-04-20 Weiqin Chen , Dharmashankar Subramanian , Santiago Paternain

Identifying Policy Gradient Subspaces

Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised…

Machine Learning · Computer Science 2024-03-19 Jan Schneider , Pierre Schumacher , Simon Guist , Le Chen , Daniel Häufle , Bernhard Schölkopf , Dieter Büchler

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes…

Machine Learning · Computer Science 2020-04-13 Sujay Bhatt , Alec Koppel , Vikram Krishnamurthy

Generalizable control for quantum parameter estimation through reinforcement learning

Measurement and estimation of parameters are essential for science and engineering, where one of the main quests is to find systematic schemes that can achieve high precision. While conventional schemes for quantum parameter estimation…

Quantum Physics · Physics 2021-04-29 Han Xu , Junning Li , Liqiang Liu , Yu Wang , Haidong Yuan , Xin Wang

Reinforcement Learning with an Abrupt Model Change

The problem of reinforcement learning is considered where the environment or the model undergoes a change. An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward. The algorithm…

Systems and Control · Electrical Eng. & Systems 2023-04-25 Wuxia Chen , Taposh Banerjee , Jemin George , Carl Busart

Reinforcement Learning in Economics and Finance

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal…

Theoretical Economics · Economics 2020-03-24 Arthur Charpentier , Romuald Elie , Carl Remlinger

A View on Deep Reinforcement Learning in System Optimization

Many real-world systems problems require reasoning about the long term consequences of actions taken to configure and manage the system. These problems with delayed and often sequentially aggregated reward, are often inherently…

Machine Learning · Computer Science 2019-09-06 Ameer Haj-Ali , Nesreen K. Ahmed , Ted Willke , Joseph Gonzalez , Krste Asanovic , Ion Stoica