Related papers: Optimal policy evaluation using kernel-based tempo…

Policy Evaluation in Continuous MDPs with Efficient Kernelized Gradient Temporal Difference

We consider policy evaluation in infinite-horizon discounted Markov decision problems (MDPs) with infinite spaces. We reformulate this task a compositional stochastic program with a function-valued decision variable that belongs to a…

Optimization and Control · Mathematics 2020-05-19 Alec Koppel , Garrett Warnell , Ethan Stump , Peter Stone , Alejandro Ribeiro

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of…

Machine Learning · Statistics 2022-11-09 Yaqi Duan , Martin J. Wainwright

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we…

Robotics · Computer Science 2020-06-04 Junhong Xu , Kai Yin , Lantao Liu

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a…

Machine Learning · Computer Science 2024-10-22 Woojin Chae , Kihyuk Hong , Yufan Zhang , Ambuj Tewari , Dabeen Lee

Properties of the Least Squares Temporal Difference learning algorithm

This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the…

Machine Learning · Statistics 2015-04-06 Kamil Ciosek

Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear Markov decision processes (MDPs) and linear mixture MDPs under the Bellman optimality condition. While guaranteeing computational…

Machine Learning · Computer Science 2024-09-25 Woojin Chae , Dabeen Lee

Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces

Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$.…

Machine Learning · Statistics 2025-01-17 Yang Peng , Liangyu Zhang , Zhihua Zhang

Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases,…

Machine Learning · Statistics 2020-09-17 Ashwin Pananjady , Martin J. Wainwright

Mean-Variance Optimization of Discrete Time Discounted Markov Decision Processes

In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance.…

Optimization and Control · Mathematics 2017-08-24 Li Xia

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points…

Machine Learning · Computer Science 2025-08-19 Yangchen Pan , Junfeng Wen , Chenjun Xiao , Philip Torr

Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains

We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most…

Robotics · Computer Science 2024-02-08 Junhong Xu , Kai Yin , Zheng Chen , Jason M. Gregory , Ethan A. Stump , Lantao Liu

Scalable Kernel-Based Distances for Statistical Inference and Integration

Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the…

Machine Learning · Statistics 2026-02-26 Masha Naslidnyk

Addressing Finite-Horizon MDPs via Low-Rank Tensor Value Approximation

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not…

Machine Learning · Computer Science 2026-05-14 Sergio Rozada , Jose Luis Orejuela , Antonio G. Marques

Statistical Inference for Generative Models with Maximum Mean Discrepancy

While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we…

Methodology · Statistics 2019-06-17 Francois-Xavier Briol , Alessandro Barp , Andrew B. Duncan , Mark Girolami

A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning

Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular,…

Optimization and Control · Mathematics 2022-05-25 Eloïse Berthier , Ziad Kobeissi , Francis Bach

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric considering both mean and variance of rewards together. Such performance metric is important…

Optimization and Control · Mathematics 2020-08-11 Li Xia

On Supervised On-line Rolling-Horizon Control for Infinite-Horizon Discounted Markov Decision Processes

This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an…

Optimization and Control · Mathematics 2022-06-07 Hyeong Soo Chang

Robust Average-Reward Markov Decision Processes

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on…

Machine Learning · Computer Science 2023-03-02 Yue Wang , Alvaro Velasquez , George Atia , Ashley Prater-Bennette , Shaofeng Zou

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs). In this setting, both the reward function and the transition kernel are linear with respect to the given feature maps and are…

Machine Learning · Computer Science 2024-12-24 Han Zhong , Zhongren Chen , Zhuoran Yang , Zhaoran Wang , Csaba Szepesvári

J-P: MDP. FP. PP.: Characterizing Total Expected Rewards in Markov Decision Processes as Least Fixed Points with an Application to Operational Semantics of Probabilistic Programs (Technical Report)

Markov decision processes (MDPs) with rewards are a widespread and well-studied model for systems that make both probabilistic and nondeterministic choices. A fundamental result about MDPs is that their minimal and maximal expected rewards…

Logic in Computer Science · Computer Science 2024-11-26 Kevin Batz , Benjamin Lucien Kaminski , Christoph Matheja , Tobias Winkler