English
Related papers

Related papers: Sample-Optimal Parametric Q-Learning Using Linearl…

200 papers

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $\mathcal{S}$ and the action space $\mathcal{A}$ are both finite, to obtain a nearly optimal policy with…

Machine Learning · Computer Science 2022-10-28 Bingyan Wang , Yuling Yan , Jianqing Fan

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the…

Machine Learning · Computer Science 2018-10-24 Devavrat Shah , Qiaomin Xie

We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the…

Machine Learning · Computer Science 2013-09-27 Charles Tripp , Ross D. Shachter

The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available,…

Machine Learning · Computer Science 2018-09-25 Hyung-Jin Yoon , Donghwan Lee , Naira Hovakimyan

The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are…

Machine Learning · Computer Science 2023-09-20 Noah Golowich , Ankur Moitra , Dhruv Rohatgi

Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite{shah2018q} solves MDP with bounded continuous state space by a nearest neighbor $Q$ learning approach, which…

Machine Learning · Computer Science 2024-06-18 Puning Zhao , Lifeng Lai

We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show…

Optimization and Control · Mathematics 2023-03-10 Erhan Bayraktar , Ali Devran Kara

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $\gamma$-discounted…

Machine Learning · Computer Science 2022-09-13 Gen Li , Yuting Wei , Yuejie Chi , Yuantao Gu , Yuxin Chen

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as…

Machine Learning · Statistics 2021-05-11 Aymen Al Marjani , Alexandre Proutiere

Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions. In contrast to existing reinforcement learning methods that are based on successive approximations to the nonlinear…

Machine Learning · Computer Science 2017-10-18 Mengdi Wang

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation,…

Artificial Intelligence · Computer Science 2025-02-11 Jiachen Xi , Alfredo Garcia , Petar Momcilovic

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in…

Machine Learning · Computer Science 2024-11-01 Jia Lin Hau , Erick Delage , Esther Derman , Mohammad Ghavamzadeh , Marek Petrik

We present the convergence rates of synchronous and asynchronous Q-learning for average-reward Markov decision processes, where the absence of contraction poses a fundamental challenge. Existing non-asymptotic results overcome this…

Machine Learning · Computer Science 2026-01-30 Zijun Chen , Zaiwei Chen , Nian Si , Shengbo Wang

In this study, we derive Probably Approximately Correct (PAC) bounds on the asymptotic sample-complexity for RL within the infinite-horizon Markov Decision Process (MDP) setting that are sharper than those in existing literature. The…

Machine Learning · Computer Science 2025-07-17 Mohit Prashant , Arvind Easwaran

We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample…

Machine Learning · Computer Science 2024-05-13 Davide Maran , Alberto Maria Metelli , Matteo Papini , Marcello Restelli

Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability…

Machine Learning · Computer Science 2023-09-08 Ali Devran Kara , Naci Saldi , Serdar Yüksel

We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert…

Machine Learning · Computer Science 2022-06-27 Masatoshi Uehara , Ayush Sekhari , Jason D. Lee , Nathan Kallus , Wen Sun

We study the problem of zero-delay coding for the transmission of a Markov source over a noisy channel with feedback and present a reinforcement learning solution which is guaranteed to achieve near-optimality. To this end, we formulate the…

Optimization and Control · Mathematics 2025-10-07 Liam Cregg , Fady Alajaji , Serdar Yuksel

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all…

Machine Learning · Statistics 2025-03-18 Gen Li , Changxiao Cai , Yuxin Chen , Yuting Wei , Yuejie Chi

In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive…

Machine Learning · Computer Science 2020-04-21 Xiao Li , Hanchen Xu , Jinming Zhang , Hua-hua Chang
‹ Prev 1 2 3 10 Next ›