English
Related papers

Related papers: Randomized Exploration for Reinforcement Learning …

200 papers

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in…

Machine Learning · Statistics 2024-11-19 Taehyun Hwang , Min-hwan Oh

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. We develop a provably efficient…

Machine Learning · Computer Science 2024-10-15 Jaehyun Park , Junyeop Kwon , Dabeen Lee

We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises…

Machine Learning · Computer Science 2025-01-17 Long-Fei Li , Yu-Jie Zhang , Peng Zhao , Zhi-Hua Zhou

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case…

Machine Learning · Statistics 2026-05-28 Wonyoung Kim , Min-Hwan Oh , Garud Iyengar , Assaf Zeevi

We study reinforcement learning for episodic Markov Decision Processes (MDPs) whose transitions are modelled by a multinomial logistic (MNL) model. Existing algorithms for MNL mixture MDPs yield a regret of $\smash{\tilde{O}(dH^2\sqrt{T})}$…

Artificial Intelligence · Computer Science 2026-05-20 Pierre Boudart , Pierre Gaillard , Alessandro Rudi

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given…

Machine Learning · Computer Science 2023-11-07 Jiafan He , Heyang Zhao , Dongruo Zhou , Quanquan Gu

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic…

Machine Learning · Computer Science 2023-05-23 Runlong Zhou , Ruosong Wang , Simon S. Du

We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a…

Machine Learning · Computer Science 2022-10-14 Zhihan Xiong , Ruoqi Shen , Qiwen Cui , Maryam Fazel , Simon S. Du

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent…

Machine Learning · Computer Science 2022-10-18 Zihan Zhang , Yuhang Jiang , Yuan Zhou , Xiangyang Ji

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural…

Machine Learning · Computer Science 2021-01-01 Zhuoran Yang , Chi Jin , Zhaoran Wang , Mengdi Wang , Michael I. Jordan

We consider reinforcement learning (RL) in Markov Decision Processes in which an agent repeatedly interacts with an environment that is modeled by a controlled Markov process. At each time step $t$, it earns a reward, and also incurs a…

Machine Learning · Computer Science 2023-03-16 Rahul Singh , Abhishek Gupta , Ness B. Shroff

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a…

Machine Learning · Computer Science 2025-04-30 Zihan Zhang , Yuxin Chen , Jason D. Lee , Simon S. Du

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory.…

Machine Learning · Computer Science 2023-09-11 Andrea Zanette , David Brandfonbrener , Emma Brunskill , Matteo Pirotta , Alessandro Lazaric

This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model $P$ belongs to a known family of models $\mathcal{P}$, a special case of which is when…

Machine Learning · Computer Science 2020-06-02 Alex Ayoub , Zeyu Jia , Csaba Szepesvari , Mengdi Wang , Lin F. Yang

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the…

Machine Learning · Computer Science 2022-04-21 Jiafan He , Dongruo Zhou , Quanquan Gu

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the…

Quantum Physics · Physics 2024-06-14 Han Zhong , Jiachen Hu , Yecheng Xue , Tongyang Li , Liwei Wang

We study the regret guarantee for risk-sensitive reinforcement learning (RSRL) via distributional reinforcement learning (DRL) methods. In particular, we consider finite episodic Markov decision processes whose objective is the entropic…

Machine Learning · Computer Science 2024-01-26 Hao Liang , Zhi-Quan Luo

We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary…

Machine Learning · Computer Science 2022-11-22 Yuhao Ding , Ming Jin , Javad Lavaei
‹ Prev 1 2 3 10 Next ›