Related papers: Empirical Dynamic Programming

An Empirical Dynamic Programming Algorithm for Continuous MDPs

We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from…

Optimization and Control · Mathematics 2019-04-25 William B. Haskell , Rahul Jain , Hiteshi Sharma , Pengqian Yu

Value Iteration with Guessing for Markov Chains and Markov Decision Processes

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path.…

Artificial Intelligence · Computer Science 2025-05-13 Krishnendu Chatterjee , Mahdi JafariRaviz , Raimundo Saona , Jakub Svoboda

Partial Policy Iteration for L1-Robust Markov Decision Processes

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the…

Machine Learning · Computer Science 2020-06-18 Chin Pang Ho , Marek Petrik , Wolfram Wiesemann

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI has been well studied in the context of discounted and average-cost MDPs. In this…

Machine Learning · Computer Science 2024-02-16 Yashaswini Murthy , Mehrdad Moharrami , R. Srikant

Dynamic Programming Through the Lens of Semismooth Newton-Type Methods (Extended Version)

Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to…

Optimization and Control · Mathematics 2022-06-28 Matilde Gargiani , Andrea Zanelli , Dominic Liao-McPherson , Tyler Summers , John Lygeros

Empirical Q-Value Iteration

We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as…

Optimization and Control · Mathematics 2019-01-31 Dileep Kalathil , Vivek S. Borkar , Rahul Jain

Dynamic Programming for Epistemic Uncertainty in Markov Decision Processes

In this paper, we propose a general theory of ambiguity-averse MDPs, which treats the uncertain transition probabilities as random variables and evaluates a policy via a risk measure applied to its random return. This ambiguity-averse MDP…

Computer Science and Game Theory · Computer Science 2026-02-04 Axel Benyamine , Julien Grand-Clément , Marek Petrik , Michael I. Jordan , Alain Durmus

Predictable Interval MDPs through Entropy Regularization

Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to…

Systems and Control · Electrical Eng. & Systems 2026-02-18 Menno van Zutphen , Giannis Delimpaltadakis , Maurice Heemels , Duarte Antunes

Dynamic Programming for POMDP with Jointly Discrete and Continuous State-Spaces

In this work, we study dynamic programming (DP) algorithms for partially observable Markov decision processes with jointly continuous and discrete state-spaces. We consider a class of stochastic systems which have coupled discrete and…

Optimization and Control · Mathematics 2019-03-07 Donghwan Lee , Niao He , Jianghai Hu

Robust Entropy-regularized Markov Decision Processes

Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with…

Machine Learning · Computer Science 2022-01-03 Tien Mai , Patrick Jaillet

A Contracting Dynamical System Perspective toward Interval Markov Decision Processes

Interval Markov decision processes are a class of Markov models where the transition probabilities between the states belong to intervals. In this paper, we study the problem of efficient estimation of the optimal policies in Interval…

Systems and Control · Electrical Eng. & Systems 2023-09-19 Saber Jafarpour , Samuel Coogan

Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo

Designing efficient learning algorithms with complexity guarantees for Markov decision processes (MDPs) with large or continuous state and action spaces remains a fundamental challenge. We address this challenge for entropy-regularized MDPs…

Machine Learning · Computer Science 2025-06-05 Matthieu Meunier , Christoph Reisinger , Yufei Zhang

Unrolling Dynamic Programming via Graph Filters

Dynamic programming (DP) is a fundamental tool used across many engineering fields. The main goal of DP is to solve Bellman's optimality equations for a given Markov decision process (MDP). Standard methods like policy iteration exploit the…

Artificial Intelligence · Computer Science 2025-07-30 Sergio Rozada , Samuel Rey , Gonzalo Mateos , Antonio G. Marques

Formally Verified Solution Methods for Infinite-Horizon Markov Decision Processes

We formally verify executable algorithms for solving Markov decision processes (MDPs) in the interactive theorem prover Isabelle/HOL. We build on existing formalizations of probability theory to analyze the expected total reward criterion…

Artificial Intelligence · Computer Science 2023-03-09 Maximilian Schäfeller , Mohammad Abdulaziz

Economic Model Predictive Control as a Solution to Markov Decision Processes

Markov Decision Processes (MDPs) offer a fairly generic and powerful framework to discuss the notion of optimal policies for dynamic systems, in particular when the dynamics are stochastic. However, computing the optimal policy of an MDP…

Systems and Control · Electrical Eng. & Systems 2024-07-24 Dirk Reinhardt , Akhil S. Anand , Shambhuraj Sawant , Sebastien Gros

Value Iteration for Long-run Average Reward in Markov Decision Processes

Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI)…

Systems and Control · Computer Science 2017-09-01 Pranav Ashok , Krishnendu Chatterjee , Przemyslaw Daca , Jan Křetínský , Tobias Meggendorfer

Geometric Re-Analysis of Classical MDP Solving Algorithms

We build on a recently introduced geometric interpretation of Markov Decision Processes (MDPs) to analyze classical MDP-solving algorithms: Value Iteration (VI) and Policy Iteration (PI). First, we develop a geometry-based analytical…

Machine Learning · Computer Science 2025-03-07 Arsenii Mustafin , Aleksei Pakharev , Alex Olshevsky , Ioannis Ch. Paschalidis

J-P: MDP. FP. PP.: Characterizing Total Expected Rewards in Markov Decision Processes as Least Fixed Points with an Application to Operational Semantics of Probabilistic Programs (Technical Report)

Markov decision processes (MDPs) with rewards are a widespread and well-studied model for systems that make both probabilistic and nondeterministic choices. A fundamental result about MDPs is that their minimal and maximal expected rewards…

Logic in Computer Science · Computer Science 2024-11-26 Kevin Batz , Benjamin Lucien Kaminski , Christoph Matheja , Tobias Winkler

Anytime Point-Based Approximations for Large POMDPs

The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally…

Artificial Intelligence · Computer Science 2011-10-05 J. Pineau , G. Gordon , S. Thrun

An approximate dynamic programming approach to the admission control of elective patients

In this paper, we propose an approximate dynamic programming (ADP) algorithm to solve a Markov decision process (MDP) formulation for the admission control of elective patients. To manage the elective patients from multiple specialties…

Optimization and Control · Mathematics 2021-03-10 Jian Zhang , Mahjoub Dridi , Abdellah El Moudni