Related papers: Learning under Invariable Bayesian Safety

Learning to be safe, in finite time

This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials, provided that one…

Machine Learning · Computer Science 2021-04-01 Agustin Castellano , Juan Bazerque , Enrique Mallada

Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis

This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL), such that the safety constraint violations are bounded at any point during learning. In a variety of RL applications the safety of the…

Machine Learning · Computer Science 2023-12-19 Rohan Mitta , Hosein Hasanbeig , Jun Wang , Daniel Kroening , Yiannis Kantaros , Alessandro Abate

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding…

Machine Learning · Computer Science 2018-09-25 Tu-Hoa Pham , Giovanni De Magistris , Don Joven Agravante , Subhajit Chaudhury , Asim Munawar , Ryuki Tachibana

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by…

Machine Learning · Computer Science 2020-08-04 Dalin Guo , Sofia Ira Ktena , Ferenc Huszar , Pranay Kumar Myana , Wenzhe Shi , Alykhan Tejani

Information-Theoretic Safe Bayesian Optimization

We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an a~priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on…

Machine Learning · Computer Science 2024-05-13 Alessandro G. Bottero , Carlos E. Luis , Julia Vinogradska , Felix Berkenkamp , Jan Peters

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to…

Machine Learning · Computer Science 2023-03-21 Yoshihiro Okawa , Tomotake Sasaki , Hitoshi Yanami , Toru Namerikawa

Meta-Learning Bandit Policies by Gradient Ascent

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters. The former…

Machine Learning · Computer Science 2021-01-07 Branislav Kveton , Martin Mladenov , Chih-Wei Hsu , Manzil Zaheer , Csaba Szepesvari , Craig Boutilier

Bayesian bandits: balancing the exploration-exploitation tradeoff via double sampling

Reinforcement learning studies how to balance exploration and exploitation in real-world systems, optimizing interactions with the world while simultaneously learning how the world operates. One general class of algorithms for such learning…

Machine Learning · Statistics 2018-08-10 Iñigo Urteaga , Chris H. Wiggins

Bayesian Incentive-Compatible Bandit Exploration

Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as…

Computer Science and Game Theory · Computer Science 2019-05-06 Yishay Mansour , Aleksandrs Slivkins , Vasilis Syrgkanis

Verifiably Safe Exploration for End-to-End Reinforcement Learning

Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end…

Artificial Intelligence · Computer Science 2020-07-03 Nathan Hunt , Nathan Fulton , Sara Magliacane , Nghia Hoang , Subhro Das , Armando Solar-Lezama

Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures

When learning to ride a bike, a child falls down a number of times before achieving the first success. As falling down usually has only mild consequences, it can be seen as a tolerable failure in exchange for a faster learning process, as…

Machine Learning · Computer Science 2020-05-18 Alonso Marco , Alexander von Rohr , Dominik Baumann , José Miguel Hernández-Lobato , Sebastian Trimpe

Active Learning with Safety Constraints

Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that…

Machine Learning · Computer Science 2022-06-23 Romain Camilleri , Andrew Wagenmaker , Jamie Morgenstern , Lalit Jain , Kevin Jamieson

Bandit Algorithms for Policy Learning: Methods, Implementation, and Welfare-performance

Static supervised learning-in which experimental data serves as a training sample for the estimation of an optimal treatment assignment policy-is a commonly assumed framework of policy learning. An arguably more realistic but challenging…

Econometrics · Economics 2024-09-04 Toru Kitagawa , Jeff Rowley

Differentiable Bandit Exploration

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution $\mathcal{P}$. In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from…

Machine Learning · Computer Science 2020-06-11 Craig Boutilier , Chih-Wei Hsu , Branislav Kveton , Martin Mladenov , Csaba Szepesvari , Manzil Zaheer

Occupancy Map Building through Bayesian Exploration

We propose a novel holistic approach for safe autonomous exploration and map building based on constrained Bayesian optimisation. This method finds optimal continuous paths instead of discrete sensing locations that inherently satisfy…

Robotics · Computer Science 2017-03-02 Gilad Francis , Lionel Ott , Roman Marchant , Fabio Ramos

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions…

Machine Learning · Computer Science 2018-01-11 Sampath Kannan , Jamie Morgenstern , Aaron Roth , Bo Waggoner , Zhiwei Steven Wu

Learning to Act Safely with Limited Exposure and Almost Sure Certainty

This paper puts forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials. This is indeed possible,…

Systems and Control · Electrical Eng. & Systems 2023-02-14 Agustin Castellano , Hancheng Min , Juan Bazerque , Enrique Mallada

Model-Based Bayesian Exploration

Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of…

Artificial Intelligence · Computer Science 2013-01-30 Richard Dearden , Nir Friedman , David Andre

Evaluating Online Bandit Exploration In Large-Scale Recommender System

Bandit learning has been an increasingly popular design choice for recommender system. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from…

Information Retrieval · Computer Science 2023-08-01 Hongbo Guo , Ruben Naeff , Alex Nikulkov , Zheqing Zhu

Linear Stochastic Bandits Under Safety Constraints

Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic…

Machine Learning · Computer Science 2019-08-19 Sanae Amani , Mahnoosh Alizadeh , Christos Thrampoulidis