Related papers: Learning POMDPs with Linear Function Approximation…

Near Optimal Approximations and Finite Memory Policies for POMPDs with Continuous Spaces

We study an approximation method for partially observed Markov decision processes (POMDPs) with continuous spaces. Belief MDP reduction, which has been the standard approach to study POMDPs requires rigorous approximation methods for…

Optimization and Control · Mathematics 2025-01-20 Ali Devran Kara , Erhan Bayraktar , Serdar Yuksel

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

In this paper, for POMDPs, we provide the convergence of a Q learning algorithm for control policies using a finite history of past observations and control actions, and, consequentially, we establish near optimality of such limit Q…

Machine Learning · Computer Science 2022-10-27 Ali Devran Kara , Serdar Yuksel

Reinforcement Learning with Function Approximation for Non-Markov Processes

We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorithm converges under suitable ergodicity conditions…

Machine Learning · Computer Science 2026-01-05 Ali Devran Kara

Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the…

Computer Science and Game Theory · Computer Science 2022-09-29 Krishnendu Chatterjee , Raimundo Saona , Bruno Ziliotto

Near Optimality of Finite Memory Feedback Policies in Partially Observed Markov Decision Processes

In the theory of Partially Observed Markov Decision Processes (POMDPs), existence of optimal policies have in general been established via converting the original partially observed stochastic control problem to a fully observed one on the…

Optimization and Control · Mathematics 2022-01-11 Ali Devran Kara , Serdar Yuksel

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…

Artificial Intelligence · Computer Science 2017-06-20 Kamyar Azizzadenesheli , Alessandro Lazaric , Animashree Anandkumar

Scalable Policy-Based RL Algorithms for POMDPs

The continuous nature of belief states in POMDPs presents significant computational challenges in learning the optimal policy. In this paper, we consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem…

Machine Learning · Computer Science 2025-10-15 Ameya Anjarlekar , Rasoul Etesami , R Srikant

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial…

Machine Learning · Computer Science 2022-02-09 Yonathan Efroni , Chi Jin , Akshay Krishnamurthy , Sobhan Miryoosefi

Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…

Artificial Intelligence · Computer Science 2017-06-20 Kamyar Azizzadenesheli , Alessandro Lazaric , Animashree Anandkumar

Finite-Time Analysis of Natural Actor-Critic for POMDPs

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying…

Machine Learning · Computer Science 2023-07-20 Semih Cayci , Niao He , R. Srikant

Model-Based Learning of Near-Optimal Finite-Window Policies in POMDPs

We study model-based learning of finite-window policies in tabular partially observable Markov decision processes (POMDPs). A common approach to learning under partial observability is to approximate unbounded history dependencies using…

Machine Learning · Computer Science 2026-04-02 Philip Jordan , Maryam Kamgarpour

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial…

Machine Learning · Computer Science 2024-04-02 Qi Cai , Zhuoran Yang , Zhaoran Wang

Value-Function Approximations for Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a…

Artificial Intelligence · Computer Science 2011-06-02 M. Hauskrecht

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit{Partially Observable Bilinear Actor-Critic framework}, that is general enough to include models such as…

Machine Learning · Computer Science 2022-06-27 Masatoshi Uehara , Ayush Sekhari , Jason D. Lee , Nathan Kallus , Wen Sun

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates…

Machine Learning · Computer Science 2023-03-24 Andrew Bennett , Nathan Kallus

Confident Natural Policy Gradient for Local Planning in $q_\pi$-realizable Constrained MDPs

The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how…

Machine Learning · Computer Science 2024-12-11 Tian Tian , Lin F. Yang , Csaba Szepesvári

Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity

Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability…

Machine Learning · Computer Science 2023-09-08 Ali Devran Kara , Naci Saldi , Serdar Yüksel

Practical Linear Value-approximation Techniques for First-order MDPs

Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to…

Artificial Intelligence · Computer Science 2012-07-02 Scott Sanner , Craig Boutilier

Memoryless Policy Iteration for Episodic POMDPs

Memoryless and finite-memory policies offer a practical alternative for solving partially observable Markov decision processes (POMDPs), as they operate directly in the output space rather than in the high-dimensional belief space. However,…

Machine Learning · Computer Science 2025-12-15 Roy van Zuijlen , Duarte Antunes