Related papers: On the function approximation error for risk-sensi…
We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a…
This work studies discrete-time discounted Markov decision processes with continuous state and action spaces and addresses the inverse problem of inferring a cost function from observed optimal behavior. We first consider the case in which…
In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. In particular, we derive novel bounds on the loss of using a policy derived from a factored linear model, a class…
In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance…
We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions…
Value function approximation is important in modern reinforcement learning (RL) problems especially when the state space is (infinitely) large. Despite the importance and wide applicability of value function approximation, its theoretical…
We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,\epsilon)$-additive bicriteria approximation algorithm for finding optimal…
In this paper, we consider the risk-sensitive cost criterion with exponentiated costs for Markov decision processes and develop a model-free policy gradient algorithm in this setting. Unlike additive cost criteria such as average or…
Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite…
While reinforcement learning has shown experimental success in a number of applications, it is known to be sensitive to noise and perturbations in the parameters of the system, leading to high variance in the total reward amongst different…
We derive an approximation error bound that holds simultaneously for a function and all its derivatives up to any prescribed order. The bounds apply to elementary functions, including multivariate polynomials, the exponential function, and…
This paper studies the safe reinforcement learning problem formulated as an episodic finite-horizon tabular constrained Markov decision process with an unknown transition kernel and stochastic reward and cost functions. We propose a…
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…
We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback…
In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich…
Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding…
In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to…
Estimating the value function for a fixed policy is a fundamental problem in reinforcement learning. Policy evaluation algorithms---to estimate value functions---continue to be developed, to improve convergence rates, improve stability and…
We study risk-sensitive reinforcement learning in finite discounted MDPs, where a generative model of the MDP is assumed to be available. We consider a family or risk measures called the optimized certainty equivalent (OCE), which includes…
We consider the problem of learning the optimal policy for Markov decision processes with safety constraints. We formulate the problem in a reach-avoid setup. Our goal is to design online reinforcement learning algorithms that ensure safety…