Related papers: Rethinking Value Function Learning for Generalizat…

Learning Dynamics and Generalization in Reinforcement Learning

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal…

Machine Learning · Computer Science 2022-06-07 Clare Lyle , Mark Rowland , Will Dabney , Marta Kwiatkowska , Yarin Gal

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose…

Machine Learning · Computer Science 2020-09-10 Karl Cobbe , Jacob Hilton , Oleg Klimov , John Schulman

Improving Deep Policy Gradients with Value Function Search

Deep Policy Gradient (PG) algorithms employ value networks to drive the learning of parameterized policies and reduce the variance of the gradient estimates. However, value function approximation gets stuck in local optima and struggles to…

Machine Learning · Computer Science 2023-02-21 Enrico Marchesini , Christopher Amato

Deterministic Value-Policy Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we…

Machine Learning · Computer Science 2019-11-14 Qingpeng Cai , Ling Pan , Pingzhong Tang

Deterministic Policy Gradient for Reinforcement Learning with Continuous Time and State

The theory of continuous-time reinforcement learning (RL) has progressed rapidly in recent years. While the ultimate objective of RL is typically to learn deterministic control policies, most existing continuous-time RL methods rely on…

Machine Learning · Computer Science 2026-03-17 Ziheng Cheng , Xin Guo , Yufei Zhang

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper,…

Robotics · Computer Science 2019-10-10 Arunkumar Byravan , Jost Tobias Springenberg , Abbas Abdolmaleki , Roland Hafner , Michael Neunert , Thomas Lampe , Noah Siegel , Nicolas Heess , Martin Riedmiller

DiGrad: Multi-Task Reinforcement Learning with Shared Actions

Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network…

Machine Learning · Computer Science 2018-03-01 Parijat Dewangan , S Phaniteja , K Madhava Krishna , Abhishek Sarkar , Balaraman Ravindran

Reinforcement Learning by Value Gradients

The concept of the value-gradient is introduced and developed in the context of reinforcement learning. It is shown that by learning the value-gradients exploration or stochastic behaviour is no longer needed to find locally optimal…

Neural and Evolutionary Computing · Computer Science 2008-03-26 Michael Fairbank

Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and…

Machine Learning · Computer Science 2018-08-14 Hélène Plisnier , Denis Steckelmacher , Tim Brys , Diederik M. Roijers , Ann Nowé

DNA: Proximal Policy Optimization with a Dual Network Architecture

This paper explores the problem of simultaneously learning a value function and policy in deep actor-critic reinforcement learning models. We find that the common practice of learning these functions jointly is sub-optimal, due to an…

Machine Learning · Computer Science 2022-11-15 Matthew Aitchison , Penny Sweetser

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence

Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications. While traditional RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks…

Machine Learning · Computer Science 2025-02-03 Minheng Xiao , Xian Yu , Lei Ying

A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

Although well-established in general reinforcement learning (RL), value-based methods are rarely explored in constrained RL (CRL) for their incapability of finding policies that can randomize among multiple actions. To apply value-based…

Machine Learning · Computer Science 2022-06-28 Tianchi Cai , Wenpeng Zhang , Lihong Gu , Xiaodong Zeng , Jinjie Gu

Deep Reinforcement Learning for Resource Constrained Multiclass Scheduling in Wireless Networks

The problem of resource constrained scheduling in a dynamic and heterogeneous wireless setting is considered here. In our setup, the available limited bandwidth resources are allocated in order to serve randomly arriving service demands,…

Machine Learning · Computer Science 2022-04-01 Apostolos Avranas , Marios Kountouris , Philippe Ciblat

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value…

Machine Learning · Computer Science 2022-07-05 Francesco Faccio , Aditya Ramesh , Vincent Herrmann , Jean Harb , Jürgen Schmidhuber

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing…

Machine Learning · Statistics 2023-01-06 Chengchun Shi , Zhengling Qi , Jianing Wang , Fan Zhou

Partial Policy Gradients for RL in LLMs

Reinforcement learning is a framework for learning to act sequentially in an unknown environment. We propose a natural approach for modeling policy structure in policy gradients. The key idea is to optimize for a subset of future rewards:…

Machine Learning · Computer Science 2026-03-09 Puneet Mathur , Branislav Kveton , Subhojyoti Mukherjee , Viet Dac Lai

Improving Generalization in Reinforcement Learning with Mixture Regularization

Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout…

Machine Learning · Computer Science 2020-10-22 Kaixin Wang , Bingyi Kang , Jie Shao , Jiashi Feng

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. Modern communication systems are becoming increasingly complex, and are required to handle multiple types of traffic with widely…

Machine Learning · Computer Science 2021-05-04 Mohammani Zaki , Avi Mohan , Aditya Gopalan , Shie Mannor

Evolved Policy Gradients

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve…

Machine Learning · Computer Science 2018-05-01 Rein Houthooft , Richard Y. Chen , Phillip Isola , Bradly C. Stadie , Filip Wolski , Jonathan Ho , Pieter Abbeel

FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying…

Machine Learning · Computer Science 2024-02-09 Janaka Chathuranga Brahmanage , Jiajing Ling , Akshat Kumar