English
Related papers

Related papers: Intervention Efficient Algorithm for Two-Stage Cau…

200 papers

We introduce causal Markov Decision Processes (C-MDPs), a new formalism for sequential decision making which combines the standard MDP formulation with causal structures over state transition and reward functions. Many contemporary and…

Machine Learning · Statistics 2021-02-16 Yangyi Lu , Amirhossein Meisami , Ambuj Tewari

We study the problem of determining the best intervention in a Causal Bayesian Network (CBN) specified only by its causal graph. We model this as a stochastic multi-armed bandit (MAB) problem with side-information, where the interventions…

Machine Learning · Computer Science 2022-05-20 Aurghya Maiti , Vineet Nair , Gaurav Sinha

This work studies online episodic tabular Markov decision processes (MDPs) with known transitions and develops best-of-both-worlds algorithms that achieve refined data-dependent regret bounds in the adversarial regime and variance-dependent…

Machine Learning · Computer Science 2026-02-03 Mingyi Li , Taira Tsuchiya , Kenji Yamanishi

We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific…

Machine Learning · Computer Science 2021-06-29 Damianos Tranos , Alexandre Proutiere

Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and…

Machine Learning · Computer Science 2026-03-26 Guy Zamir , Matthew Zurek , Yudong Chen

Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified.…

Machine Learning · Computer Science 2020-12-15 Vineet Nair , Vishakha Patil , Gaurav Sinha

We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic…

Machine Learning · Computer Science 2023-05-23 Runlong Zhou , Zihan Zhang , Simon S. Du

Causal knowledge about the relationships among decision variables and a reward variable in a bandit setting can accelerate the learning of an optimal decision. Current works often assume the causal graph is known, which may not always be…

Machine Learning · Statistics 2024-11-07 Muhammad Qasim Elahi , Mahsa Ghasemi , Murat Kocaoglu

A standard assumption in Reinforcement Learning is that the agent observes every visited state-action pair in the associated Markov Decision Process (MDP), along with the per-step rewards. Strong theoretical results are known in this…

Machine Learning · Computer Science 2026-02-03 Zhengjia Zhuo , Anupam Gupta , Viswanath Nagarajan

In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the…

Machine Learning · Computer Science 2021-02-02 Qinbo Bai , Vaneet Aggarwal , Ather Gattami

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an…

Machine Learning · Computer Science 2021-06-25 Andrea Tirinzoni , Matteo Pirotta , Alessandro Lazaric

This paper presents a new model-free algorithm for episodic finite-horizon Markov Decision Processes (MDP), Adaptive Multi-step Bootstrap (AMB), which enjoys a stronger gap-dependent regret bound. The first innovation is to estimate the…

Machine Learning · Computer Science 2021-07-05 Haike Xu , Tengyu Ma , Simon S. Du

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

We study the problem of learning 'good' interventions in a stochastic environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. Specifically, we consider the setting of a…

Machine Learning · Computer Science 2024-01-17 Fateme Jamshidi , Jalal Etesami , Negar Kiyavash

In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing…

Machine Learning · Computer Science 2024-09-27 Francesco Emanuele Stradi , Anna Lunghi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

We propose novel classical and quantum online algorithms for learning finite-horizon and infinite-horizon average-reward Markov Decision Processes (MDPs). Our algorithms are based on a hybrid exploration-generative reinforcement learning…

Machine Learning · Computer Science 2025-08-12 Andris Ambainis , Joao F. Doriguello , Debbie Lim

In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm,…

Machine Learning · Computer Science 2024-02-06 Qinbo Bai , Washim Uddin Mondal , Vaneet Aggarwal

We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the…

Machine Learning · Computer Science 2021-10-27 Christoph Dann , Teodor V. Marinov , Mehryar Mohri , Julian Zimmert

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based…

Machine Learning · Computer Science 2020-06-25 Yi Tian , Jian Qian , Suvrit Sra

We introduce a new framework of episodic tabular Markov decision processes (MDPs) with adversarial preferences, which we refer to as preference-based MDPs (PbMDPs). Unlike standard episodic MDPs with adversarial losses, where the numerical…

Machine Learning · Computer Science 2025-07-17 Taira Tsuchiya , Shinji Ito , Haipeng Luo
‹ Prev 1 2 3 10 Next ›