Related papers: Variance-Adaptive Optimal Algorithm for Reinforcem…

Reinforcement Learning algorithms for regret minimization in structured Markov Decision Processes

A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation…

Machine Learning · Computer Science 2016-08-18 K J Prabuchandran , Tejas Bodas , Theja Tulabandhula

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and…

Machine Learning · Computer Science 2020-11-20 Jonathan N. Lee , Aldo Pacchiano , Vidya Muthukumar , Weihao Kong , Emma Brunskill

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability kernel of the Markov decision processes (MDPs) is parametrized by an unknown transition core with features of…

Machine Learning · Statistics 2024-11-01 Wooseong Cho , Taehyun Hwang , Joongkyu Lee , Min-hwan Oh

Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs

We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms for MNL mixture MDPs yield a regret of $\smash{\tilde{O}(dH^2\sqrt{T})}$…

Artificial Intelligence · Computer Science 2026-05-20 Pierre Boudart , Pierre Gaillard , Alessandro Rudi

Variational Regret Bounds for Reinforcement Learning

We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an…

Machine Learning · Computer Science 2019-09-11 Pratik Gajane , Ronald Ortner , Peter Auer

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

We study variance-dependent regret bounds for Markov decision processes (MDPs). Algorithms with variance-dependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic…

Machine Learning · Computer Science 2023-05-23 Runlong Zhou , Zihan Zhang , Simon S. Du

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves…

Machine Learning · Computer Science 2019-12-12 Aristide Tossou , Debabrota Basu , Christos Dimitrakakis

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in…

Machine Learning · Statistics 2024-11-19 Taehyun Hwang , Min-hwan Oh

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes in which an agent repeatedly interacts with an environment that is modeled by a controlled Markov process. At each time step $t$, it earns a reward, and also incurs a…

Machine Learning · Computer Science 2023-03-16 Rahul Singh , Abhishek Gupta , Ness B. Shroff

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given…

Machine Learning · Computer Science 2023-11-07 Jiafan He , Heyang Zhao , Dongruo Zhou , Quanquan Gu

Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises…

Machine Learning · Computer Science 2025-01-17 Long-Fei Li , Yu-Jie Zhang , Peng Zhao , Zhi-Hua Zhou

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an…

Machine Learning · Computer Science 2022-06-02 Sanae Amani , Lin F. Yang , Ching-An Cheng

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the…

Machine Learning · Computer Science 2024-04-16 Priyank Agrawal , Theja Tulabandhula , Vashist Avadhanula

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit

In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a…

Machine Learning · Statistics 2025-10-17 Joongkyu Lee , Min-hwan Oh

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Infinite-Horizon Reinforcement Learning with Multinomial Logistic Function Approximation

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. We develop a provably efficient…

Machine Learning · Computer Science 2024-10-15 Jaehyun Park , Junyeop Kwon , Dabeen Lee

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret…

Machine Learning · Computer Science 2023-12-13 Xiang Ji , Gen Li

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. In each period, the learning agent observes a $d$-dimensional contextual…

Machine Learning · Statistics 2021-03-26 Min-hwan Oh , Garud Iyengar

The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches

In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting. We focus on learning with general function classes and general model classes, and we…

Machine Learning · Computer Science 2022-03-04 Grigoris Velegkas , Zhuoran Yang , Amin Karbasi