Related papers: Kernel Taylor-Based Value Function Approximation f…
We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most…
We consider policy evaluation in infinite-horizon discounted Markov decision problems (MDPs) with infinite spaces. We reformulate this task a compositional stochastic program with a function-valued decision variable that belongs to a…
Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the…
Motion planning under uncertainty for an autonomous system can be formulated as a Markov Decision Process with a continuous state space. In this paper, we propose a novel solution to this decision-theoretic planning problem that directly…
A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various…
We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the…
We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD)…
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak…
We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do…
Value iteration is a well-known method of solving Markov Decision Processes (MDPs) that is simple to implement and boasts strong theoretical convergence guarantees. However, the computational cost of value iteration quickly becomes…
In many operations management problems, we need to make decisions sequentially to minimize the cost while satisfying certain constraints. One modeling approach to study such problems is constrained Markov decision process (CMDP). When…
Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…
In this paper we provide faster algorithms for approximately solving discounted Markov Decision Processes in multiple parameter regimes. Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor…
Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat…
In this paper, we consider risk-sensitive Markov Decision Processes (MDPs) with Borel state and action spaces and unbounded cost under both finite and infinite planning horizons. Our optimality criterion is based on the recursive…
Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach…
In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition…
In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy…
Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to…
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…