Related papers: Policy Gradient Bayesian Robust Optimization for I…

Bayesian Robust Optimization for Imitation Learning

One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by…

Machine Learning · Computer Science 2024-03-04 Daniel S. Brown , Scott Niekum , Marek Petrik

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper,…

Robotics · Computer Science 2019-10-10 Arunkumar Byravan , Jost Tobias Springenberg , Abbas Abdolmaleki , Roland Hafner , Michael Neunert , Thomas Lampe , Noah Siegel , Nicolas Heess , Martin Riedmiller

On the Performance of Maximum Likelihood Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) addresses the problem of recovering a task description given a demonstration of the optimal policy used to solve such a task. The optimal policy is usually provided by an expert or teacher, making IRL…

Machine Learning · Computer Science 2012-02-09 Héctor Ratia , Luis Montesano , Ruben Martinez-Cantin

Stabilizing Policy Gradient Methods via Reward Profiling

Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from…

Machine Learning · Computer Science 2026-01-27 Shihab Ahmed , El Houcine Bergou , Aritra Dutta , Yue Wang

Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs

We consider (stochastic) softmax policy gradient (PG) methods for bandits and tabular Markov decision processes (MDPs). While the PG objective is non-concave, recent research has used the objective's smoothness and gradient domination…

Machine Learning · Computer Science 2024-10-01 Michael Lu , Matin Aghaei , Anant Raj , Sharan Vaswani

A policy gradient approach for optimization of smooth risk measures

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of…

Machine Learning · Computer Science 2024-06-25 Nithia Vijayan , Prashanth L. A

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model…

Machine Learning · Computer Science 2022-05-17 Yue Wang , Shaofeng Zou

Smoothing Policies and Safe Policy Gradients

Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics. However, the trial-and-error nature of these methods poses safety…

Machine Learning · Computer Science 2022-06-20 Matteo Papini , Matteo Pirotta , Marcello Restelli

Probability Density Estimation Based Imitation Learning

Imitation Learning (IL) is an effective learning paradigm exploiting the interactions between agents and environments. It does not require explicit reward signals and instead tries to recover desired policies using expert demonstrations. In…

Machine Learning · Computer Science 2021-12-14 Yang Liu , Yongzhe Chang , Shilei Jiang , Xueqian Wang , Bin Liang , Bo Yuan

Coherent Soft Imitation Learning

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are…

Machine Learning · Computer Science 2023-12-07 Joe Watson , Sandy H. Huang , Nicolas Heess

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the…

Machine Learning · Computer Science 2023-04-20 Weiqin Chen , Dharmashankar Subramanian , Santiago Paternain

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment…

Machine Learning · Computer Science 2023-10-11 Siddhant Agarwal , Ishan Durugkar , Peter Stone , Amy Zhang

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.…

Machine Learning · Computer Science 2020-08-14 Alekh Agarwal , Mikael Henaff , Sham Kakade , Wen Sun

Soft-Robust Algorithms for Batch Reinforcement Learning

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately,…

Machine Learning · Computer Science 2021-03-01 Elita A. Lobo , Mohammad Ghavamzadeh , Marek Petrik

Balance Equation-based Distributionally Robust Offline Imitation Learning

Imitation Learning (IL) has proven highly effective for robotic and control tasks where manually designing reward functions or explicit controllers is infeasible. However, standard IL methods implicitly assume that the environment dynamics…

Machine Learning · Computer Science 2025-11-12 Rishabh Agrawal , Yusuf Alvi , Rahul Jain , Ashutosh Nayyar

Provable Reward-Agnostic Preference-Based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated…

Machine Learning · Computer Science 2024-04-18 Wenhao Zhan , Masatoshi Uehara , Wen Sun , Jason D. Lee

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Learning Diverse Policies with Soft Self-Generated Guidance

Reinforcement learning (RL) with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. Hence, the gradient calculated by the agent can be stochastic and without valid information. Recent studies that…

Machine Learning · Computer Science 2024-02-08 Guojian Wang , Faguo Wu , Xiao Zhang , Jianxiang Liu

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

The problem of inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration. Despite significant algorithmic contributions in recent years, IRL remains an ill-posed…

Machine Learning · Computer Science 2020-11-18 Sreejith Balakrishnan , Quoc Phong Nguyen , Bryan Kian Hsiang Low , Harold Soh

Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning

Sequential Bayesian optimal experimental design (SBOED) for PDE-governed inverse problems is computationally challenging, especially for infinite-dimensional random field parameters. High-fidelity approaches require repeated forward and…

Optimization and Control · Mathematics 2026-01-12 Kaichen Shen , Peng Chen