Related papers: Structural Return Maximization for Reinforcement L…

Model Selection for Inverse Reinforcement Learning via Structural Risk Minimization

Inverse reinforcement learning (IRL) usually assumes the reward function model is pre-specified as a weighted sum of features and estimates the weighting parameters only. However, how to select features and determine a proper reward model…

Machine Learning · Computer Science 2025-04-01 Chendi Qu , Jianping He , Xiaoming Duan , Jiming Chen

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Reinforcement learning (RL) algorithms aim to learn optimal decisions in unknown environments through experience of taking actions and observing the rewards gained. In some cases, the environment is not influenced by the actions of the RL…

Machine Learning · Computer Science 2022-06-02 Jonathan P. Epperlein , Roman Overko , Sergiy Zhuk , Christopher King , Djallel Bouneffouf , Andrew Cullen , Robert Shorten

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum…

Machine Learning · Computer Science 2020-03-31 Sungryull Sohn , Yinlam Chow , Jayden Ooi , Ofir Nachum , Honglak Lee , Ed Chi , Craig Boutilier

Reinforcement Learning algorithms for regret minimization in structured Markov Decision Processes

A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation…

Machine Learning · Computer Science 2016-08-18 K J Prabuchandran , Tejas Bodas , Theja Tulabandhula

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions…

Machine Learning · Computer Science 2020-07-23 Yao Liu , Adith Swaminathan , Alekh Agarwal , Emma Brunskill

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. The existing methods require absolutely continuous assumption (e.g., there…

Machine Learning · Statistics 2024-06-27 Xiaohong Chen , Zhengling Qi , Runzhe Wan

Continuous Doubly Constrained Batch Reinforcement Learning

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL,…

Machine Learning · Computer Science 2021-12-07 Rasool Fakoor , Jonas Mueller , Kavosh Asadi , Pratik Chaudhari , Alexander J. Smola

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

This paper considers batch Reinforcement Learning (RL) with general value function approximation. Our study investigates the minimal assumptions to reliably estimate/minimize Bellman error, and characterizes the generalization performance…

Machine Learning · Computer Science 2021-03-26 Yaqi Duan , Chi Jin , Zhiyuan Li

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality…

Machine Learning · Computer Science 2022-06-20 Zuxin Liu , Zhepeng Cen , Vladislav Isenbaev , Wei Liu , Zhiwei Steven Wu , Bo Li , Ding Zhao

R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning

In this work, we address the problem of determining reliable policies in reinforcement learning (RL), with a focus on optimization under uncertainty and the need for performance guarantees. While classical RL algorithms aim at maximizing…

Machine Learning · Computer Science 2025-10-22 Nadir Farhi

Linear Reinforcement Learning with Ball Structure Action Space

We study the problem of Reinforcement Learning (RL) with linear function approximation, i.e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping. Unfortunately, however, based on only this…

Machine Learning · Computer Science 2022-11-15 Zeyu Jia , Randy Jia , Dhruv Madeka , Dean P. Foster

Programmatic Reinforcement Learning: Navigating Gridworlds

The field of reinforcement learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order…

Machine Learning · Computer Science 2025-01-13 Guruprerana Shabadi , Nathanaël Fijalkow , Théo Matricon

Exploration via Planning for Information about the Optimal Trajectory

Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or…

Machine Learning · Computer Science 2022-10-11 Viraj Mehta , Ian Char , Joseph Abbate , Rory Conlin , Mark D. Boyer , Stefano Ermon , Jeff Schneider , Willie Neiswanger

Probabilistic Satisfaction of Temporal Logic Constraints in Reinforcement Learning via Adaptive Policy-Switching

Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework. Unlike conventional RL which aims solely to maximize cumulative rewards, CRL…

Artificial Intelligence · Computer Science 2024-12-02 Xiaoshan Lin , Sadık Bera Yüksel , Yasin Yazıcıoğlu , Derya Aksaray

Maximum Likelihood Reinforcement Learning

Reinforcement learning is the method of choice to train models in sampling-based setups with binary outcome feedback, such as navigation, code generation, and mathematical problem solving. In such settings, models implicitly induce a…

Machine Learning · Computer Science 2026-02-04 Fahim Tajwar , Guanning Zeng , Yueer Zhou , Yuda Song , Daman Arora , Yiding Jiang , Jeff Schneider , Ruslan Salakhutdinov , Haiwen Feng , Andrea Zanette

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function…

Machine Learning · Computer Science 2024-05-28 Marcel Hussing , Michael Kearns , Aaron Roth , Sikata Bela Sengupta , Jessica Sorrell

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that…

Machine Learning · Computer Science 2020-11-10 Botao Hao , Yaqi Duan , Tor Lattimore , Csaba Szepesvári , Mengdi Wang

Soft-Robust Algorithms for Batch Reinforcement Learning

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately,…

Machine Learning · Computer Science 2021-03-01 Elita A. Lobo , Mohammad Ghavamzadeh , Marek Petrik

Stackelberg Batch Policy Learning

Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some…

Machine Learning · Statistics 2023-10-03 Wenzhuo Zhou , Annie Qu

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar