English
Related papers

Related papers: Optimal policy evaluation using kernel-based tempo…

200 papers

We consider policy evaluation in infinite-horizon discounted Markov decision problems (MDPs) with infinite spaces. We reformulate this task a compositional stochastic program with a function-valued decision variable that belongs to a…

Optimization and Control · Mathematics 2020-05-19 Alec Koppel , Garrett Warnell , Ethan Stump , Peter Stone , Alejandro Ribeiro

We study non-parametric estimation of the value function of an infinite-horizon $\gamma$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of…

Machine Learning · Statistics 2022-11-09 Yaqi Duan , Martin J. Wainwright

We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we…

Robotics · Computer Science 2020-06-04 Junhong Xu , Kai Yin , Lantao Liu

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a…

Machine Learning · Computer Science 2024-10-22 Woojin Chae , Kihyuk Hong , Yufan Zhang , Ambuj Tewari , Dabeen Lee

This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the…

Machine Learning · Statistics 2015-04-06 Kamil Ciosek

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear Markov decision processes (MDPs) and linear mixture MDPs under the Bellman optimality condition. While guaranteeing computational…

Machine Learning · Computer Science 2024-09-25 Woojin Chae , Dabeen Lee

Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$.…

Machine Learning · Statistics 2025-01-17 Yang Peng , Liangyu Zhang , Zhihua Zhang

Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases,…

Machine Learning · Statistics 2020-09-17 Ashwin Pananjady , Martin J. Wainwright

In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance.…

Optimization and Control · Mathematics 2017-08-24 Li Xia

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points…

Machine Learning · Computer Science 2025-08-19 Yangchen Pan , Junfeng Wen , Chenjun Xiao , Philip Torr

We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most…

Robotics · Computer Science 2024-02-08 Junhong Xu , Kai Yin , Zheng Chen , Jason M. Gregory , Ethan A. Stump , Lantao Liu

Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the…

Machine Learning · Statistics 2026-02-26 Masha Naslidnyk

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not…

Machine Learning · Computer Science 2026-05-14 Sergio Rozada , Jose Luis Orejuela , Antonio G. Marques

While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we…

Methodology · Statistics 2019-06-17 Francois-Xavier Briol , Alessandro Barp , Andrew B. Duncan , Mark Girolami

Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular,…

Optimization and Control · Mathematics 2022-05-25 Eloïse Berthier , Ziad Kobeissi , Francis Bach

This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric considering both mean and variance of rewards together. Such performance metric is important…

Optimization and Control · Mathematics 2020-08-11 Li Xia

This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an…

Optimization and Control · Mathematics 2022-06-07 Hyeong Soo Chang

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on…

Machine Learning · Computer Science 2023-03-02 Yue Wang , Alvaro Velasquez , George Atia , Ashley Prater-Bennette , Shaofeng Zou

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs). In this setting, both the reward function and the transition kernel are linear with respect to the given feature maps and are…

Machine Learning · Computer Science 2024-12-24 Han Zhong , Zhongren Chen , Zhuoran Yang , Zhaoran Wang , Csaba Szepesvári

Markov decision processes (MDPs) with rewards are a widespread and well-studied model for systems that make both probabilistic and nondeterministic choices. A fundamental result about MDPs is that their minimal and maximal expected rewards…

Logic in Computer Science · Computer Science 2024-11-26 Kevin Batz , Benjamin Lucien Kaminski , Christoph Matheja , Tobias Winkler
‹ Prev 1 2 3 10 Next ›