Related papers: Conservative Optimistic Policy Optimization via Mu…

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on…

Machine Learning · Statistics 2023-12-27 Paul Daoudi , Mathias Formoso , Othman Gaizi , Achraf Azize , Evrard Garcelon

Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game…

Artificial Intelligence · Computer Science 2026-03-03 Austin A. Nguyen , Michael P. Wellman

Optimistic Policy Optimization with Bandit Feedback

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of…

Machine Learning · Computer Science 2020-06-19 Yonathan Efroni , Lior Shani , Aviv Rosenberg , Shie Mannor

Bi-Level Offline Policy Optimization with Limited Exploration

We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration,…

Machine Learning · Computer Science 2023-10-11 Wenzhuo Zhou

Variance-Reduced Conservative Policy Iteration

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to…

Machine Learning · Computer Science 2023-01-26 Naman Agarwal , Brian Bullins , Karan Singh

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While…

Machine Learning · Computer Science 2026-05-04 Haichen Hu , Jian Qian , David Simchi-Levi

Safe Continuous Control with Constrained Model-Based Policy Optimization

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast…

Machine Learning · Computer Science 2021-04-15 Moritz A. Zanger , Karam Daaboul , J. Marius Zöllner

COMBO: Conservative Offline Model-Based Policy Optimization

Model-based algorithms, which learn a dynamics model from logged experience and perform some sort of pessimistic planning under the learned model, have emerged as a promising paradigm for offline reinforcement learning (offline RL).…

Machine Learning · Computer Science 2022-01-28 Tianhe Yu , Aviral Kumar , Rafael Rafailov , Aravind Rajeswaran , Sergey Levine , Chelsea Finn

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its…

Machine Learning · Computer Science 2023-10-18 Xiaohan Hu , Yi Ma , Chenjun Xiao , Yan Zheng , Jianye Hao

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does…

Machine Learning · Computer Science 2025-09-23 Shaocong Ma , Ziyi Chen , Yi Zhou , Heng Huang

Reducing Conservativeness Oriented Offline Reinforcement Learning

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value…

Machine Learning · Computer Science 2021-03-02 Hongchang Zhang , Jianzhun Shao , Yuhang Jiang , Shuncheng He , Xiangyang Ji

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.…

Machine Learning · Computer Science 2023-08-02 Abbas Abdolmaleki , Sandy H. Huang , Giulia Vezzani , Bobak Shahriari , Jost Tobias Springenberg , Shruti Mishra , Dhruva TB , Arunkumar Byravan , Konstantinos Bousmalis , Andras Gyorgy , Csaba Szepesvari , Raia Hadsell , Nicolas Heess , Martin Riedmiller

Policy Optimization as Online Learning with Mediator Feedback

Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available…

Machine Learning · Computer Science 2020-12-16 Alberto Maria Metelli , Matteo Papini , Pierluca D'Oro , Marcello Restelli

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy…

Machine Learning · Computer Science 2022-10-12 Marc Rigter , Bruno Lacerda , Nick Hawes

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

In this paper, we study offline preference-based reinforcement learning (PbRL), where learning is based on pre-collected preference feedback over pairs of trajectories. While offline PbRL has demonstrated remarkable empirical success,…

Machine Learning · Computer Science 2025-06-04 Hyungkyu Kang , Min-hwan Oh

Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning

Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any…

Robotics · Computer Science 2019-10-23 Guillaume Bellegarda , Katie Byl

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be…

Machine Learning · Computer Science 2022-10-25 Rui Yang , Chenjia Bai , Xiaoteng Ma , Zhaoran Wang , Chongjie Zhang , Lei Han

Provably Efficient Exploration in Policy Optimization

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably…

Machine Learning · Computer Science 2024-04-02 Qi Cai , Zhuoran Yang , Chi Jin , Zhaoran Wang

Towards Tractable Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the…

Machine Learning · Computer Science 2020-07-10 Yu Bai , Chi Jin