Related papers: Improper Reinforcement Learning with Gradient-base…

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform…

Machine Learning · Computer Science 2022-07-20 Mohammadi Zaki , Avinash Mohan , Aditya Gopalan , Shie Mannor

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The…

Machine Learning · Computer Science 2012-06-26 Gergely Neu , Csaba Szepesvari

Inverse Reinforcement Learning from a Gradient-based Learner

Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but we also observe part of her…

Machine Learning · Computer Science 2021-09-03 Giorgia Ramponi , Gianluca Drappo , Marcello Restelli

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Using the policy gradient algorithm, we train a single-hidden-layer neural network to balance a physically accurate simulation of a single inverted pendulum. The trained weights and biases can then be transferred to a physical agent, where…

Machine Learning · Computer Science 2021-02-17 Dylan Bates

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and…

Optimization and Control · Mathematics 2022-10-11 Bin Hu , Kaiqing Zhang , Na Li , Mehran Mesbahi , Maryam Fazel , Tamer Başar

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model…

Machine Learning · Computer Science 2022-05-17 Yue Wang , Shaofeng Zou

Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…

Machine Learning · Computer Science 2020-10-19 Santiago Paternain , Juan Andres Bazerque , Alejandro Ribeiro

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

In order to model risk aversion in reinforcement learning, an emerging line of research adapts familiar algorithms to optimize coherent risk functionals, a class that includes conditional value-at-risk (CVaR). Because optimizing the…

Machine Learning · Computer Science 2021-03-09 Audrey Huang , Liu Leqi , Zachary C. Lipton , Kamyar Azizzadenesheli

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. Modern communication systems are becoming increasingly complex, and are required to handle multiple types of traffic with widely…

Machine Learning · Computer Science 2021-05-04 Mohammani Zaki , Avi Mohan , Aditya Gopalan , Shie Mannor

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

Policy Gradient Methods for Distortion Risk Measures

We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision…

Machine Learning · Computer Science 2024-02-06 Nithia Vijayan , Prashanth L. A

Policy Optimization with Differentiable MPC: Convergence Analysis under Uncertainty

Model-based policy optimization is a well-established framework for designing reliable and high-performance controllers across a wide range of control applications. Recently, this approach has been extended to model predictive control…

Systems and Control · Electrical Eng. & Systems 2026-04-15 Riccardo Zuliani , Efe C. Balta , John Lygeros

Safe Model-based Reinforcement Learning with Stability Guarantees

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world…

Machine Learning · Statistics 2017-11-15 Felix Berkenkamp , Matteo Turchetta , Angela P. Schoellig , Andreas Krause

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties,…

Machine Learning · Computer Science 2020-10-16 Alekh Agarwal , Sham M. Kakade , Jason D. Lee , Gaurav Mahajan

Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning

In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a…

Artificial Intelligence · Computer Science 2024-05-28 Shangding Gu , Bilgehan Sel , Yuhao Ding , Lu Wang , Qingwei Lin , Alois Knoll , Ming Jin

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics,…

Machine Learning · Computer Science 2026-05-15 Matias Alvo , Daniel Russo , Yash Kanoria

A policy gradient approach for optimization of smooth risk measures

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Policy Gradient for Rectangular Robust Markov Decision Processes

Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally…

Machine Learning · Computer Science 2023-12-12 Navdeep Kumar , Esther Derman , Matthieu Geist , Kfir Levy , Shie Mannor

On the Performance of Maximum Likelihood Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) addresses the problem of recovering a task description given a demonstration of the optimal policy used to solve such a task. The optimal policy is usually provided by an expert or teacher, making IRL…

Machine Learning · Computer Science 2012-02-09 Héctor Ratia , Luis Montesano , Ruben Martinez-Cantin

DiffOP: Reinforcement Learning of Optimization-Based Control Policies via Implicit Policy Gradients

Real-world control systems require policies that are not only high-performing but also interpretable and robust. A promising direction toward this goal is model-based control, which learns system dynamics and cost functions from historical…

Systems and Control · Electrical Eng. & Systems 2025-11-20 Yuexin Bian , Jie Feng , Yuanyuan Shi