English
Related papers

Related papers: Mitigating Partial Observability in Sequential Dec…

200 papers

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state…

Machine Learning · Computer Science 2024-03-18 Cameron Allen , Neev Parikh , Omer Gottesman , George Konidaris

Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP).…

Machine Learning · Computer Science 2022-02-01 Jeongyeol Kwon , Yonathan Efroni , Constantine Caramanis , Shie Mannor

In practical applications, we can rarely assume full observability of a system's environment, despite such knowledge being important for determining a reactive control system's precise interaction with its environment. Therefore, we propose…

Machine Learning · Computer Science 2022-06-24 Edi Muskardin , Martin Tappler , Bernhard K. Aichernig , Ingo Pill

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with…

Machine Learning · Computer Science 2024-06-12 Hongming Zhang , Tongzheng Ren , Chenjun Xiao , Dale Schuurmans , Bo Dai

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Sequential decision-making systems routinely operate with missing or incomplete data. Classical reinforcement learning theory, which is commonly used to solve sequential decision problems, assumes Markovian observability, which may not hold…

Machine Learning · Computer Science 2025-08-07 MaryLena Bleile , Minh-Nhat Phung , Minh-Binh Tran

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to…

Machine Learning · Computer Science 2023-10-04 Alessandro Abate , Yousif Almulla , James Fox , David Hyland , Michael Wooldridge

Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task. While many state-only imitation learning approaches are based on…

Machine Learning · Computer Science 2024-10-02 Damian Boborzi , Christoph-Nikolas Straehle , Jens S. Buchner , Lars Mikelsons

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum…

Machine Learning · Computer Science 2022-05-05 Lorenzo Steccanella , Anders Jonsson

Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios. Yet, previous modeling work on agent learning and…

Adaptation and Self-Organizing Systems · Physics 2022-04-15 Wolfram Barfuss , Richard P. Mann

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold…

Machine Learning · Computer Science 2025-10-15 Zixuan Xie , Xinyu Liu , Rohan Chandra , Shangtong Zhang

TD($\lambda$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($\lambda$) has been shown to minimise the squared error between the approximate value…

Machine Learning · Computer Science 2025-12-24 Lex Weaver , Jonathan Baxter

This paper proposes a reinforcement learning method for controller synthesis of autonomous systems in unknown and partially-observable environments with subjective time-dependent safety constraints. Mathematically, we model the system…

Robotics · Computer Science 2021-04-06 Yu Wang , Alper Kamil Bozkurt , Miroslav Pajic

This work introduces a non-intrusive model reduction approach for learning reduced models from partially observed state trajectories of high-dimensional dynamical systems. The proposed approach compensates for the loss of information due to…

Machine Learning · Computer Science 2021-03-29 Wayne Isaac Tan Uy , Benjamin Peherstorfer

Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes…

Machine Learning · Computer Science 2021-05-21 Lucas N. Alegre , Ana L. C. Bazzan , Bruno C. da Silva

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as…

Machine Learning · Computer Science 2026-02-19 Ethan Blaser , Jiuqi Wang , Shangtong Zhang
‹ Prev 1 2 3 10 Next ›