Related papers: An Empirical Dynamic Programming Algorithm for Con…

Empirical Dynamic Programming

We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical…

Optimization and Control · Mathematics 2013-11-26 William B. Haskell , Rahul Jain , Dileep Kalathil

Value Iteration with Guessing for Markov Chains and Markov Decision Processes

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…

Artificial Intelligence · Computer Science 2025-05-13 Krishnendu Chatterjee , Mahdi JafariRaviz , Raimundo Saona , Jakub Svoboda

Empirical Q-Value Iteration

We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as…

Optimization and Control · Mathematics 2019-01-31 Dileep Kalathil , Vivek S. Borkar , Rahul Jain

Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo

Designing efficient learning algorithms with complexity guarantees for Markov decision processes (MDPs) with large or continuous state and action spaces remains a fundamental challenge. We address this challenge for entropy-regularized MDPs…

Machine Learning · Computer Science 2025-06-05 Matthieu Meunier , Christoph Reisinger , Yufei Zhang

Policy Evaluation in Continuous MDPs with Efficient Kernelized Gradient Temporal Difference

We consider policy evaluation in infinite-horizon discounted Markov decision problems (MDPs) with infinite spaces. We reformulate this task a compositional stochastic program with a function-valued decision variable that belongs to a…

Optimization and Control · Mathematics 2020-05-19 Alec Koppel , Garrett Warnell , Ethan Stump , Peter Stone , Alejandro Ribeiro

Value Iteration for Long-run Average Reward in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…

Systems and Control · Computer Science 2017-09-01 Pranav Ashok , Krishnendu Chatterjee , Przemyslaw Daca , Jan Křetínský , Tobias Meggendorfer

Value iteration for approximate dynamic programming under convexity

This paper studies value iteration for infinite horizon contracting Markov decision processes under convexity assumptions and when the state space is uncountable. The original value iteration is replaced with a more tractable form and the…

Optimization and Control · Mathematics 2018-02-21 Jeremy Yee

An Accelerated Fitted Value Iteration Algorithm for MDPs with Finite and Vector-Valued Action Space

This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it…

Optimization and Control · Mathematics 2020-11-30 Sixiang Zhao , William B. Haskell , Michel-Alexandre Cardin

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

We study computationally and statistically efficient Reinforcement Learning algorithms for the linear Bellman Complete setting. This setting uses linear function approximation to capture value functions and unifies existing models like…

Machine Learning · Computer Science 2025-03-04 Runzhe Wu , Ayush Sekhari , Akshay Krishnamurthy , Wen Sun

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

We present the first finite-sample analysis of policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior work in this setting have established only asymptotic convergence guarantees, leaving open the question of…

Machine Learning · Statistics 2025-12-11 Yang Xu , Washim Uddin Mondal , Vaneet Aggarwal

Approximate Value Iteration for Risk-aware Markov Decision Processes

We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be…

Systems and Control · Computer Science 2017-05-17 Pengqian Yu , William B. Haskell , Huan Xu

Regularized Q-Learning with Linear Function Approximation

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains…

Machine Learning · Computer Science 2026-03-25 Zakaria Mhammedi , Alexander Rakhlin , Nneka Okolo

Dynamic Programming for POMDP with Jointly Discrete and Continuous State-Spaces

In this work, we study dynamic programming (DP) algorithms for partially observable Markov decision processes with jointly continuous and discrete state-spaces. We consider a class of stochastic systems which have coupled discrete and…

Optimization and Control · Mathematics 2019-03-07 Donghwan Lee , Niao He , Jianghai Hu

Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random Features

Randomized algorithms exploit stochasticity to reduce computational complexity. One important example is random feature regression (RFR) that accelerates Gaussian process regression (GPR). RFR approximates an unknown function with a random…

Machine Learning · Computer Science 2025-02-26 Oliver R. A. Dunbar , Nicholas H. Nelsen , Maya Mutic

A Dynamic Programming Approach to Evaluating Multivariate Gaussian Probabilities

We propose a method of approximating multivariate Gaussian probabilities using dynamic programming. We show that solving the optimization problem associated with a class of discrete-time finite horizon Markov decision processes with…

Optimization and Control · Mathematics 2018-02-08 Morgan Jones , Matthew M. Peet

Partial Policy Iteration for L1-Robust Markov Decision Processes

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the…

Machine Learning · Computer Science 2020-06-18 Chin Pang Ho , Marek Petrik , Wolfram Wiesemann

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one,…

Artificial Intelligence · Computer Science 2008-08-13 Istvan Szita , Andras Lorincz

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI has been well studied in the context of discounted and average-cost MDPs. In this…

Machine Learning · Computer Science 2024-02-16 Yashaswini Murthy , Mehrdad Moharrami , R. Srikant

Bayesian Risk-Sensitive Policy Optimization For MDPs With General Loss Functions

Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach…

Machine Learning · Computer Science 2025-10-02 Xiaoshuang Wang , Yifan Lin , Enlu Zhou