Related papers: Mitigating Partial Observability in Sequential Dec…

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Learning Markov State Abstractions for Deep Reinforcement Learning

A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state…

Machine Learning · Computer Science 2024-03-18 Cameron Allen , Neev Parikh , Omer Gottesman , George Konidaris

Reinforcement Learning in Reward-Mixing MDPs

Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP).…

Machine Learning · Computer Science 2022-02-01 Jeongyeol Kwon , Yonathan Efroni , Constantine Caramanis , Shie Mannor

Reinforcement Learning under Partial Observability Guided by Learned Environment Models

In practical applications, we can rarely assume full observability of a system's environment, despite such knowledge being important for determining a reactive control system's precise interaction with its environment. Therefore, we propose…

Machine Learning · Computer Science 2022-06-24 Edi Muskardin , Martin Tappler , Bernhard K. Aichernig , Ingo Pill

Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with…

Machine Learning · Computer Science 2024-06-12 Hongming Zhang , Tongzheng Ren , Chenjun Xiao , Dale Schuurmans , Bo Dai

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

A Relative Ignorability Framework for Decision-Relevant Observability in Control Theory and Reinforcement Learning

Sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold…

Machine Learning · Computer Science 2025-08-07 MaryLena Bleile , Minh-Nhat Phung , Minh-Binh Tran

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to…

Machine Learning · Computer Science 2023-10-04 Alessandro Abate , Yousif Almulla , James Fox , David Hyland , Michael Wooldridge

Imitation Learning by State-Only Distribution Matching

Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task. While many state-only imitation learning approaches are based on…

Machine Learning · Computer Science 2024-10-02 Damian Boborzi , Christoph-Nikolas Straehle , Jens S. Buchner , Lars Mikelsons

Towards Parameter-Free Temporal Difference Learning

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

State Representation Learning for Goal-Conditioned Reinforcement Learning

This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum…

Machine Learning · Computer Science 2022-05-05 Lorenzo Steccanella , Anders Jonsson

Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability

Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios. Yet, previous modeling work on agent learning and…

Adaptation and Self-Organizing Systems · Physics 2022-04-15 Wolfram Barfuss , Richard P. Mann

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold…

Machine Learning · Computer Science 2025-10-15 Zixuan Xie , Xinyu Liu , Rohan Chandra , Shangtong Zhang

Reinforcement Learning From State and Temporal Differences

TD($\lambda$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($\lambda$) has been shown to minimise the squared error between the approximate value…

Machine Learning · Computer Science 2025-12-24 Lex Weaver , Jonathan Baxter

Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

This paper proposes a reinforcement learning method for controller synthesis of autonomous systems in unknown and partially-observable environments with subjective time-dependent safety constraints. Mathematically, we model the system…

Robotics · Computer Science 2021-04-06 Yu Wang , Alper Kamil Bozkurt , Miroslav Pajic

Operator inference of non-Markovian terms for learning reduced models from partially observed state trajectories

This work introduces a non-intrusive model reduction approach for learning reduced models from partially observed state trajectories of high-dimensional dynamical systems. The proposed approach compensates for the loss of information due to…

Machine Learning · Computer Science 2021-03-29 Wayne Isaac Tan Uy , Benjamin Peherstorfer

Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes…

Machine Learning · Computer Science 2021-05-21 Lucas N. Alegre , Ana L. C. Bazzan , Bruno C. da Silva

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as…

Machine Learning · Computer Science 2026-02-19 Ethan Blaser , Jiuqi Wang , Shangtong Zhang