Related papers: Conservative Exploration using Interleaving

Conservative Exploration in Reinforcement Learning

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward. Although the agent will…

Machine Learning · Computer Science 2020-07-16 Evrard Garcelon , Mohammad Ghavamzadeh , Alessandro Lazaric , Matteo Pirotta

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

Constrained Exploration in Reinforcement Learning with Optimality Preservation

We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may…

Machine Learning · Computer Science 2023-04-07 Peter C. Y. Chen

Active Learning with Safety Constraints

Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that…

Machine Learning · Computer Science 2022-06-23 Romain Camilleri , Andrew Wagenmaker , Jamie Morgenstern , Lalit Jain , Kevin Jamieson

Conservative Agency via Attainable Utility Preservation

Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes…

Artificial Intelligence · Computer Science 2020-06-11 Alexander Matt Turner , Dylan Hadfield-Menell , Prasad Tadepalli

Active Learning within Constrained Environments through Imitation of an Expert Questioner

Active learning agents typically employ a query selection algorithm which solely considers the agent's learning objectives. However, this may be insufficient in more realistic human domains. This work uses imitation learning to enable an…

Machine Learning · Computer Science 2019-07-02 Kalesha Bullard , Yannick Schroecker , Sonia Chernova

Learning the Preferences of a Learning Agent

For AI systems to be useful to humans, they must understand and act in accordance with our values and preferences. Since specifying preferences is a hard task, inverse reinforcement learning (IRL) aims to develop methods that allow for…

Artificial Intelligence · Computer Science 2026-05-12 Karim Abdel Sadek , Mark Bedaywi , Rhys Gould , Stuart Russell

Deciding What to Learn: A Rate-Distortion Approach

Agents that learn to select optimal actions represent a prominent focus of the sequential decision-making literature. In the face of a complex environment or constraints on time and resources, however, aiming to synthesize such an optimal…

Machine Learning · Computer Science 2021-06-23 Dilip Arumugam , Benjamin Van Roy

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on…

Machine Learning · Statistics 2023-12-27 Paul Daoudi , Mathias Formoso , Othman Gaizi , Achraf Azize , Evrard Garcelon

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the stochastic linear bandit setup with a fixed confident constraint. In the considered best action identification problem, instead of minimizing the accumulative regret as…

Machine Learning · Computer Science 2018-12-04 Jun Geng , Lifeng Lai

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can…

Machine Learning · Computer Science 2026-04-29 Gergely Neu , Michal Valko

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and…

Machine Learning · Computer Science 2023-07-25 Ayush Sekhari , Karthik Sridharan , Wen Sun , Runzhe Wu

An Information-Theoretic Analysis of Nonstationary Bandit Learning

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes…

Machine Learning · Computer Science 2023-12-27 Seungki Min , Daniel Russo

Adaptive Decision Making via Entropy Minimization

An agent choosing between various actions tends to take the one with the lowest cost. But this choice is arguably too rigid (not adaptive) to be useful in complex situations, e.g., where exploration-exploitation trade-off is relevant in…

Data Analysis, Statistics and Probability · Physics 2018-12-04 Armen E. Allahverdyan , Aram Galstyan , Ali E. Abbas , Zbigniew R. Struzik

Learning to be safe, in finite time

This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials, provided that one…

Machine Learning · Computer Science 2021-04-01 Agustin Castellano , Juan Bazerque , Enrique Mallada

Satisficing Exploration for Deep Reinforcement Learning

A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world,…

Machine Learning · Computer Science 2024-07-23 Dilip Arumugam , Saurabh Kumar , Ramki Gummadi , Benjamin Van Roy

Learning under Imitative Strategic Behavior with Unforeseeable Outcomes

Machine learning systems have been widely used to make decisions about individuals who may behave strategically to receive favorable outcomes, e.g., they may genuinely improve the true labels or manipulate observable features directly to…

Artificial Intelligence · Computer Science 2024-10-30 Tian Xie , Zhiqun Zuo , Mohammad Mahdi Khalili , Xueru Zhang

Exploration Conscious Reinforcement Learning Revisited

The Exploration-Exploitation tradeoff arises in Reinforcement Learning when one cannot tell if a policy is optimal. Then, there is a constant need to explore new actions instead of exploiting past experience. In practice, it is common to…

Machine Learning · Computer Science 2019-09-10 Lior Shani , Yonathan Efroni , Shie Mannor

Reward-Conditioned Policies

Reinforcement learning offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood supervised learning methods, reinforcement learning algorithms can be brittle,…

Machine Learning · Computer Science 2020-01-01 Aviral Kumar , Xue Bin Peng , Sergey Levine

Learning When Not to Learn: Risk-Sensitive Abstention in Bandits with Unbounded Rewards

In high-stakes AI applications, even a single action can cause irreparable damage. However, nearly all of sequential decision-making theory assumes that all errors are recoverable (e.g., by bounding rewards). Standard bandit algorithms that…

Machine Learning · Computer Science 2026-04-14 Sarah Liaw , Benjamin Plaut