Related papers: Constrained Upper Confidence Reinforcement Learnin…

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) algorithm introduced in (Jaksch et al., 2010) is a popular method to perform regret minimization in unknown discrete Markov Decision Processes under the average-reward criterion. Despite…

Machine Learning · Computer Science 2021-04-14 Hippolyte Bourel , Odalric-Ambrym Maillard , Mohammad Sadegh Talebi

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes in which an agent repeatedly interacts with an environment that is modeled by a controlled Markov process. At each time step $t$, it earns a reward, and also incurs a…

Machine Learning · Computer Science 2023-03-16 Rahul Singh , Abhishek Gupta , Ness B. Shroff

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Online reinforcement learning (RL) has been widely applied in information processing scenarios, which usually exhibit much uncertainty due to the intrinsic randomness of channels and service demands. In this paper, we consider an…

Machine Learning · Computer Science 2021-06-17 Rongpeng Li

Constrained Markov Decision Processes via Backward Value Functions

Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety…

Machine Learning · Computer Science 2020-08-28 Harsh Satija , Philip Amortila , Joelle Pineau

Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints

In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing…

Machine Learning · Computer Science 2024-09-27 Francesco Emanuele Stradi , Anna Lunghi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

We consider online learning for episodic stochastically constrained Markov decision processes (CMDPs), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the…

Machine Learning · Computer Science 2021-10-19 Shuang Qiu , Xiaohan Wei , Zhuoran Yang , Jieping Ye , Zhaoran Wang

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves…

Machine Learning · Computer Science 2019-12-12 Aristide Tossou , Debabrota Basu , Christos Dimitrakakis

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective…

Machine Learning · Computer Science 2020-06-26 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of…

Machine Learning · Computer Science 2023-01-26 Tao Liu , Ruida Zhou , Dileep Kalathil , P. R. Kumar , Chao Tian

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

Delegative Reinforcement Learning: learning to avoid traps with a little help

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an…

Machine Learning · Computer Science 2019-07-22 Vanessa Kosoy

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under temporal drifts, ie, both the reward and state transition distributions are allowed to evolve over time, as long as their respective total…

Machine Learning · Computer Science 2020-05-19 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Robust Constrained Reinforcement Learning

Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack,…

Machine Learning · Computer Science 2022-09-16 Yue Wang , Fei Miao , Shaofeng Zou

Online Reinforcement Learning in Periodic MDP

We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We…

Machine Learning · Computer Science 2023-03-20 Ayush Aniket , Arpan Chattopadhyay

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism

This paper studies the safe reinforcement learning problem formulated as an episodic finite-horizon tabular constrained Markov decision process with an unknown transition kernel and stochastic reward and cost functions. We propose a…

Machine Learning · Computer Science 2024-10-15 Kihyun Yu , Duksang Lee , William Overman , Dabeen Lee

Provably Efficient Exploration in Reward Machines with Low Regret

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with…

Machine Learning · Computer Science 2024-12-30 Hippolyte Bourel , Anders Jonsson , Odalric-Ambrym Maillard , Chenxiao Ma , Mohammad Sadegh Talebi

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology,…

Machine Learning · Computer Science 2024-08-26 Vaneet Aggarwal , Washim Uddin Mondal , Qinbo Bai

Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

We consider the challenge of finding a deterministic policy for a Markov decision process that uniformly (in all states) maximizes one reward subject to a probabilistic constraint over a different reward. Existing solutions do not fully…

Machine Learning · Computer Science 2022-01-21 Jaeyoung Lee , Sean Sedwards , Krzysztof Czarnecki