Related papers: Model-Based Reinforcement Learning Under Confoundi…

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates…

Machine Learning · Computer Science 2023-03-24 Andrew Bennett , Nathan Kallus

Goal-oriented inference of environment from redundant observations

The agent learns to organize decision behavior to achieve a behavioral goal, such as reward maximization, and reinforcement learning is often used for this optimization. Learning an optimal behavioral strategy is difficult under the…

Machine Learning · Computer Science 2023-05-09 Kazuki Takahashi , Tomoki Fukai , Yutaka Sakai , Takashi Takekawa

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be…

Machine Learning · Computer Science 2022-09-20 Zuyue Fu , Zhengling Qi , Zhaoran Wang , Zhuoran Yang , Yanxun Xu , Michael R. Kosorok

Reinforcement Learning in Reward-Mixing MDPs

Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP).…

Machine Learning · Computer Science 2022-02-01 Jeongyeol Kwon , Yonathan Efroni , Constantine Caramanis , Shie Mannor

Block Contextual MDPs for Continual Learning

In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the…

Machine Learning · Computer Science 2021-10-15 Shagun Sodhani , Franziska Meier , Joelle Pineau , Amy Zhang

Model-Based Exploration in Monitored Markov Decision Processes

A tenet of reinforcement learning is that the agent always observes rewards. However, this is not true in many realistic settings, e.g., a human observer may not always be available to provide rewards, sensors may be limited or…

Machine Learning · Computer Science 2026-03-24 Alireza Kazemipour , Simone Parisi , Matthew E. Taylor , Michael Bowling

Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes…

Machine Learning · Computer Science 2021-05-21 Lucas N. Alegre , Ana L. C. Bazzan , Bruno C. da Silva

On Reward Structures of Markov Decision Processes

A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various…

Machine Learning · Computer Science 2023-09-04 Falcon Z. Dai

Cross-fitted Proximal Learning for Model-Based Reinforcement Learning

Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding,…

Machine Learning · Computer Science 2026-04-08 Nishanth Venkatesh , Andreas A. Malikopoulos

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami

Reinforcement Learning with Continuous Actions Under Unmeasured Confounding

This paper addresses the challenge of offline policy learning in reinforcement learning with continuous action spaces when unmeasured confounders are present. While most existing research focuses on policy evaluation within partially…

Machine Learning · Statistics 2025-05-02 Yuhan Li , Eugene Han , Yifan Hu , Wenzhuo Zhou , Zhengling Qi , Yifan Cui , Ruoqing Zhu

Generalization in Monitored Markov Decision Processes (Mon-MDPs)

Reinforcement learning (RL) typically models the interaction between the agent and environment as a Markov decision process (MDP), where the rewards that guide the agent's behavior are always observable. However, in many real-world…

Artificial Intelligence · Computer Science 2025-05-15 Montaser Mohammedalamen , Michael Bowling

Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning

In offline reinforcement learning, a policy is learned using a static dataset in the absence of costly feedback from the environment. In contrast to the online setting, only using static datasets poses additional challenges, such as…

Machine Learning · Computer Science 2025-12-16 Marvin Alles , Philip Becker-Ehmck , Patrick van der Smagt , Maximilian Karl

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs to the…

Machine Learning · Computer Science 2017-11-10 Aurko Roy , Huan Xu , Sebastian Pokutta

Explainable Reinforcement Learning via Model Transforms

Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of…

Artificial Intelligence · Computer Science 2022-12-02 Mira Finkelstein , Lucy Liu , Nitsan Levy Schlot , Yoav Kolumbus , David C. Parkes , Jeffrey S. Rosenshein , Sarah Keren

Observational Overfitting in Reinforcement Learning

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process…

Machine Learning · Computer Science 2020-01-01 Xingyou Song , Yiding Jiang , Stephen Tu , Yilun Du , Behnam Neyshabur

Reinforcement Learning in Switching Non-Stationary Markov Decision Processes: Algorithms and Convergence Analysis

Reinforcement learning in non-stationary environments is challenging due to abrupt and unpredictable changes in dynamics, often causing traditional algorithms to fail to converge. However, in many real-world cases, non-stationarity has some…

Machine Learning · Computer Science 2025-03-25 Mohsen Amiri , Sindri Magnússon

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers. When the model class is misspecified or has a limited representational capacity,…

Machine Learning · Computer Science 2021-06-08 Evgenii Nikishin , Romina Abachi , Rishabh Agarwal , Pierre-Luc Bacon

Learn to Change the World: Multi-level Reinforcement Learning with Model-Changing Actions

Reinforcement learning usually assumes a given or sometimes even fixed environment in which an agent seeks an optimal policy to maximize its long-term discounted reward. In contrast, we consider agents that are not limited to passive…

Machine Learning · Computer Science 2025-10-20 Ziqing Lu , Babak Hassibi , Lifeng Lai , Weiyu Xu

Learning Generalizable Representations for Reinforcement Learning via Adaptive Meta-learner of Behavioral Similarities

How to learn an effective reinforcement learning-based model for control tasks from high-level visual observations is a practical and challenging problem. A key to solving this problem is to learn low-dimensional state representations from…

Machine Learning · Computer Science 2022-12-27 Jianda Chen , Sinno Jialin Pan