English
Related papers

Related papers: Conservative Optimistic Policy Optimization via Mu…

200 papers

A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on…

Machine Learning · Statistics 2023-12-27 Paul Daoudi , Mathias Formoso , Othman Gaizi , Achraf Azize , Evrard Garcelon

Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game…

Artificial Intelligence · Computer Science 2026-03-03 Austin A. Nguyen , Michael P. Wellman

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of…

Machine Learning · Computer Science 2020-06-19 Yonathan Efroni , Lior Shani , Aviv Rosenberg , Shie Mannor

We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration,…

Machine Learning · Computer Science 2023-10-11 Wenzhuo Zhou

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to…

Machine Learning · Computer Science 2023-01-26 Naman Agarwal , Brian Bullins , Karan Singh

Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While…

Machine Learning · Computer Science 2026-05-04 Haichen Hu , Jian Qian , David Simchi-Levi

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast…

Machine Learning · Computer Science 2021-04-15 Moritz A. Zanger , Karam Daaboul , J. Marius Zöllner

Model-based algorithms, which learn a dynamics model from logged experience and perform some sort of pessimistic planning under the learned model, have emerged as a promising paradigm for offline reinforcement learning (offline RL).…

Machine Learning · Computer Science 2022-01-28 Tianhe Yu , Aviral Kumar , Rafael Rafailov , Aravind Rajeswaran , Sergey Levine , Chelsea Finn

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its…

Machine Learning · Computer Science 2023-10-18 Xiaohan Hu , Yi Ma , Chenjun Xiao , Yan Zheng , Jianye Hao

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does…

Machine Learning · Computer Science 2025-09-23 Shaocong Ma , Ziyi Chen , Yi Zhou , Heng Huang

In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value…

Machine Learning · Computer Science 2021-03-02 Hongchang Zhang , Jianzhun Shao , Yuhang Jiang , Shuncheng He , Xiangyang Ji

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.…

Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available…

Machine Learning · Computer Science 2020-12-16 Alberto Maria Metelli , Matteo Papini , Pierluca D'Oro , Marcello Restelli

Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy…

Machine Learning · Computer Science 2022-10-12 Marc Rigter , Bruno Lacerda , Nick Hawes

In this paper, we study offline preference-based reinforcement learning (PbRL), where learning is based on pre-collected preference feedback over pairs of trajectories. While offline PbRL has demonstrated remarkable empirical success,…

Machine Learning · Computer Science 2025-06-04 Hyungkyu Kang , Min-hwan Oh

Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any…

Robotics · Computer Science 2019-10-23 Guillaume Bellegarda , Katie Byl

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be…

Machine Learning · Computer Science 2022-10-25 Rui Yang , Chenjia Bai , Xiaoteng Ma , Zhaoran Wang , Chongjie Zhang , Lei Han

While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably…

Machine Learning · Computer Science 2024-04-02 Qi Cai , Zhuoran Yang , Chi Jin , Zhaoran Wang

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the…

Machine Learning · Computer Science 2020-07-10 Yu Bai , Chi Jin
‹ Prev 1 2 3 10 Next ›