Related papers: Reinforcement Learning Algorithm Selection
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For…
Policy gradient reinforcement learning (RL) algorithms have achieved impressive performance in challenging learning tasks such as continuous control, but suffer from high sample complexity. Experience replay is a commonly used approach to…
Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…
We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…
Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science. Recently, restart strategies have gained significant attention for boosting the efficiency of…
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and…
Algorithm selection is typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance…
In this paper, we present the $\texttt{e-COP}$ algorithm, the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. Such formulations are applicable when there are separate…
In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the…
In modern ML Ops environments, model deployment is a critical process that traditionally relies on static heuristics such as validation error comparisons and A/B testing. However, these methods require human intervention to adapt to…
Instance-specific algorithm selection (AS) deals with the automatic selection of an algorithm from a fixed set of candidates most suitable for a specific instance of an algorithmic problem class, where "suitability" often refers to an…
A self-learning adaptive system (SLAS) uses machine learning to enable and enhance its adaptability. Such systems are expected to perform well in dynamic situations. For learning high-performance adaptation policy, some assumptions must be…
We introduce a new class of reinforcement learning methods referred to as {\em episodic multi-armed bandits} (eMAB). In eMAB the learner proceeds in {\em episodes}, each composed of several {\em steps}, in which it chooses an action and…
Recently developed offline reinforcement learning algorithms have made it possible to learn policies directly from pre-collected datasets, giving rise to a new dilemma for practitioners: Since the performance the algorithms are able to…
Algorithm selection (AS) deals with the automatic selection of an algorithm from a fixed set of candidate algorithms most suitable for a specific instance of an algorithmic problem class, where "suitability" often refers to an algorithm's…
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal…
An automatic machine learning (AutoML) task is to select the best algorithm and its hyper-parameters simultaneously. Previously, the hyper-parameters of all algorithms are joint as a single search space, which is not only huge but also…
The performance of reinforcement learning (RL) algorithms is sensitive to the choice of hyperparameters, with the learning rate being particularly influential. RL algorithms fail to reach convergence or demand an extensive number of samples…
Reinforcement learning (RL) has emerged as a promising strategy for finetuning small language models (SLMs) to solve targeted tasks such as math and coding. However, RL algorithms tend to be resource-intensive, taking a significant amount…
Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration…