Related papers: Reinforcement Learning with Parameterized Actions
We propose a novel model-based reinforcement learning algorithm -- Dynamics Learning and predictive control with Parameterized Actions (DLPA) -- for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a…
In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…
In this paper, a review of model-free reinforcement learning for learning of dynamical systems in uncertain environments has discussed. For this purpose, the Markov Decision Process (MDP) will be reviewed. Furthermore, some learning…
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process while guaranteeing a desired lower bound on the probability of satisfying a signal temporal logic (STL)…
In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are…
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes…
Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15]. Recently, equipped with…
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small…
Reinforcement learning would enjoy better success on real-world problems if domain knowledge could be imparted to the algorithm by the modelers. Most problems have both hidden state and unknown dynamics. Partially observable Markov decision…
Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an…
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of…
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…
In this paper, we formulate the adaptive learning problem---the problem of how to find an individualized learning plan (called policy) that chooses the most appropriate learning materials based on learner's latent traits---faced in adaptive…
We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage…
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive)…
In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where…
Safe reinforcement learning has been a promising approach for optimizing the policy of an agent that operates in safety-critical applications. In this paper, we propose an algorithm, SNO-MDP, that explores and optimizes Markov decision…
Real-world sequential decision-making often involves parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters governing how an action is executed. Existing…
Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy.…
In meta-reinforcement learning, an agent is trained in multiple different environments and attempts to learn a meta-policy that can efficiently adapt to a new environment. This paper presents RAMP, a Reinforcement learning Agent using Model…