Related papers: Reinforcement Learning from Partial Observation: L…

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial…

Machine Learning · Computer Science 2022-02-09 Yonathan Efroni , Chi Jin , Akshay Krishnamurthy , Sobhan Miryoosefi

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number…

Machine Learning · Computer Science 2020-10-27 Chi Jin , Sham M. Kakade , Akshay Krishnamurthy , Qinghua Liu

Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit{Partially Observable Bilinear Actor-Critic framework}, that is general enough to include models such as…

Machine Learning · Computer Science 2022-06-27 Masatoshi Uehara , Ayush Sekhari , Jason D. Lee , Nathan Kallus , Wen Sun

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with…

Machine Learning · Computer Science 2024-06-12 Hongming Zhang , Tongzheng Ren , Chenjun Xiao , Dale Schuurmans , Bo Dai

Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration. We introduce a novel formulation that integrates…

Machine Learning · Computer Science 2024-02-29 Tonghe Zhang , Yu Chen , Longbo Huang

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates…

Machine Learning · Computer Science 2023-03-24 Andrew Bennett , Nathan Kallus

Learning POMDPs with Linear Function Approximation and Finite Memory

We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback…

Optimization and Control · Mathematics 2025-05-22 Ali Devran Kara

Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert…

Machine Learning · Computer Science 2022-06-27 Masatoshi Uehara , Ayush Sekhari , Jason D. Lee , Nathan Kallus , Wen Sun

Near-Optimal Partially Observable Reinforcement Learning with Partial Online State Information

Partially observable Markov decision processes (POMDPs) are a general framework for sequential decision-making under latent state uncertainty, yet learning in POMDPs is intractable in the worst case. Motivated by sensing and probing…

Machine Learning · Computer Science 2026-01-27 Ming Shi , Yingbin Liang , Ness B. Shroff

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the…

Machine Learning · Computer Science 2022-05-26 Qinghua Liu , Alan Chung , Csaba Szepesvári , Chi Jin

Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…

Artificial Intelligence · Computer Science 2017-06-20 Kamyar Azizzadenesheli , Alessandro Lazaric , Animashree Anandkumar

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon.…

Machine Learning · Computer Science 2024-04-02 Lingxiao Wang , Qi Cai , Zhuoran Yang , Zhaoran Wang

A Convolution and Attention Based Encoder for Reinforcement Learning under Partial Observability

Partially Observable Markov Decision Processes (POMDPs) remain a core challenge in reinforcement learning due to incomplete state information. We address this by reformulating POMDPs as fully observable processes with fixed-length…

Machine Learning · Computer Science 2025-09-16 Wuhao Wang , Zhiyong Chen

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…

Artificial Intelligence · Computer Science 2017-06-20 Kamyar Azizzadenesheli , Alessandro Lazaric , Animashree Anandkumar

Learning in Observable POMDPs, without Computationally Intractable Oracles

Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms…

Machine Learning · Computer Science 2022-06-08 Noah Golowich , Ankur Moitra , Dhruv Rohatgi

Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

This paper proposes a reinforcement learning method for controller synthesis of autonomous systems in unknown and partially-observable environments with subjective time-dependent safety constraints. Mathematically, we model the system…

Robotics · Computer Science 2021-04-06 Yu Wang , Alper Kamil Bozkurt , Miroslav Pajic

Goal-oriented inference of environment from redundant observations

The agent learns to organize decision behavior to achieve a behavioral goal, such as reward maximization, and reinforcement learning is often used for this optimization. Learning an optimal behavioral strategy is difficult under the…

Machine Learning · Computer Science 2023-05-09 Kazuki Takahashi , Tomoki Fukai , Yutaka Sakai , Takashi Takekawa

PAC Reinforcement Learning for Predictive State Representations

In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as…

Machine Learning · Computer Science 2022-08-16 Wenhao Zhan , Masatoshi Uehara , Wen Sun , Jason D. Lee

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value…

Machine Learning · Computer Science 2020-06-30 Andrea Zanette , Alessandro Lazaric , Mykel Kochenderfer , Emma Brunskill

PAC Reinforcement Learning with Rich Observations

We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal…

Machine Learning · Computer Science 2016-10-31 Akshay Krishnamurthy , Alekh Agarwal , John Langford