Related papers: Efficient Local Planning with Linear Function Appr…
We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy…
We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map. The generative model provides a local access to the MDP: The planner can…
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…
Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of…
We study computationally and statistically efficient reinforcement learning under the linear $Q^{\pi}$ realizability assumption, where any policy's $Q$-function is linear in a given state-action feature representation. Prior methods in this…
Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak…
A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding. For example, in cooperative multi-agent reinforcement learning, a potentially large number of agents jointly…
One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To…
$Q$-learning with function approximation is one of the most popular methods in reinforcement learning. Though the idea of using function approximation was proposed at least 60 years ago, even in the simplest setup, i.e, approximating…
Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance…
Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other…
Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale…
Local Policy Search is a popular reinforcement learning approach for handling large state spaces. Formally, it searches locally in a paramet erized policy space in order to maximize the associated value function averaged over some…
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at…
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is…
A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value…
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including…
In the real world, insufficient information, limited computation resources, and complex problem structures often force an autonomous agent to make a decision in time less than that required to solve the problem at hand completely. Flexible…
This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^{\pi}$-realizability. In this framework, the objective is to learn an $\epsilon$-optimal…
How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much…