Related papers: Delay-Aware Model-Based Reinforcement Learning for…

Delay-Aware Multi-Agent Reinforcement Learning for Cooperative and Competitive Environments

Action and observation delays exist prevalently in the real-world cyber-physical systems which may pose challenges in reinforcement learning design. It is particularly an arduous task when handling multi-agent systems where the delay of one…

Machine Learning · Computer Science 2020-09-01 Baiming Chen , Mengdi Xu , Zuxin Liu , Liang Li , Ding Zhao

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that…

Machine Learning · Computer Science 2021-08-18 Somjit Nath , Mayank Baranwal , Harshad Khadilkar

DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control

Delayed Markov decision processes (DMDPs) fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. In reliance on these state augmentations, delay-resolved reinforcement…

Robotics · Computer Science 2025-11-17 Mohammadhossein Malmir , Josip Josifovski , Noah Klarmann , Alois Knoll

Delays in Reinforcement Learning

Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical…

Machine Learning · Computer Science 2023-09-21 Pierre Liotet

Model-Based Reinforcement Learning under Random Observation Delays

Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive…

Machine Learning · Computer Science 2026-04-17 Armin Karamzade , Kyungmin Kim , JB Lanier , Davide Corsi , Roy Fox

Adaptive Reinforcement Learning for Unobservable Random Delays

In standard Reinforcement Learning (RL) settings, the interaction between the agent and the environment is typically modeled as a Markov Decision Process (MDP), which assumes that the agent observes the system state instantaneously, selects…

Machine Learning · Computer Science 2025-06-18 John Wikman , Alexandre Proutiere , David Broman

Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey

In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Markov Decision Process assumption, which is violated…

Machine Learning · Statistics 2026-02-03 Armando Alves Neto

Reinforcement Learning from Delayed Observations via World Models

In standard reinforcement learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can…

Machine Learning · Computer Science 2024-06-27 Armin Karamzade , Kyungmin Kim , Montek Kalsi , Roy Fox

Acting in Delayed Environments with Non-Stationary Markov Policies

The standard Markov Decision Process (MDP) formulation hinges on the assumption that an action is executed immediately after it was chosen. However, assuming it is often unrealistic and can lead to catastrophic failures in applications such…

Machine Learning · Computer Science 2023-12-14 Esther Derman , Gal Dalal , Shie Mannor

Learning Adversarial Markov Decision Processes with Delayed Feedback

Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in…

Machine Learning · Computer Science 2021-12-16 Tal Lancewicki , Aviv Rosenberg , Yishay Mansour

Learn to Change the World: Multi-level Reinforcement Learning with Model-Changing Actions

Reinforcement learning usually assumes a given or sometimes even fixed environment in which an agent seeks an optimal policy to maximize its long-term discounted reward. In contrast, we consider agents that are not limited to passive…

Machine Learning · Computer Science 2025-10-20 Ziqing Lu , Babak Hassibi , Lifeng Lai , Weiyu Xu

Transferable Delay-Aware Reinforcement Learning via Implicit Causal Graph Modeling

Random delays weaken the temporal correspondence between actions and subsequent state feedback, making it difficult for agents to identify the true propagation process of action effects. In cross-task scenarios, changes in task objectives…

Machine Learning · Computer Science 2026-05-13 Chenran Zhao , Dianxi Shi , Yaowen Zhang , Chunping Qiu , Shaowu Yang

Decentralized Deep Reinforcement Learning for Delay-Power Tradeoff in Vehicular Communications

This paper targets at the problem of radio resource management for expected long-term delay-power tradeoff in vehicular communications. At each decision epoch, the road side unit observes the global network state, allocates channels and…

Signal Processing · Electrical Eng. & Systems 2019-06-04 Xianfu Chen , Celimuge Wu , Honggang Zhang , Yan Zhang , Mehdi Bennis , Heli Vuojala

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods…

Machine Learning · Computer Science 2024-06-06 Bo Xia , Yilun Kong , Yongzhe Chang , Bo Yuan , Zhiheng Li , Xueqian Wang , Bin Liang

A Survey on Delay-Aware Resource Control for Wireless Systems --- Large Deviation Theory, Stochastic Lyapunov Drift and Distributed Stochastic Learning

In this tutorial paper, a comprehensive survey is given on several major systematic approaches in dealing with delay-aware control problems, namely the equivalent rate constraint approach, the Lyapunov stability drift approach and the…

Performance · Computer Science 2016-11-17 Ying Cui , Vincent K. N. Lau , Rui Wang , Huang Huang , Shunqing Zhang

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

A promising way to improve the sample efficiency of reinforcement learning is model-based methods, in which many explorations and evaluations can happen in the learned models to save real-world samples. However, when the learned model has a…

Machine Learning · Computer Science 2022-09-14 Haoxin Lin , Yihao Sun , Jiaji Zhang , Yang Yu

Feature Markov Decision Processes

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes…

Artificial Intelligence · Computer Science 2009-12-30 Marcus Hutter

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with…

Machine Learning · Computer Science 2025-08-29 Juan Sebastian Rojas , Chi-Guhn Lee

Reinforcement Learning for Learning of Dynamical Systems in Uncertain Environment: a Tutorial

In this paper, a review of model-free reinforcement learning for learning of dynamical systems in uncertain environments has discussed. For this purpose, the Markov Decision Process (MDP) will be reviewed. Furthermore, some learning…

Machine Learning · Computer Science 2019-05-21 Mehran Attar , Mohammadreza Dabirian

Model-Based Reinforcement Learning Under Confounding

We investigate model-based reinforcement learning in contextual Markov decision processes (C-MDPs) in which the context is unobserved and induces confounding in the offline dataset. In such settings, conventional model-learning methods are…

Machine Learning · Computer Science 2025-12-09 Nishanth Venkatesh , Andreas A. Malikopoulos