Related papers: Efficient and Interpretable Bandit Algorithms

On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach

Interpretable and explainable machine learning has seen a recent surge of interest. We focus on safety as a key motivation behind the surge and make the relationship between interpretability and safety more quantitative. Toward assessing…

Machine Learning · Computer Science 2022-11-04 Dennis Wei , Rahul Nair , Amit Dhurandhar , Kush R. Varshney , Elizabeth M. Daly , Moninder Singh

On Elimination Strategies for Bandit Fixed-Confidence Identification

Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time. However, existing elimination…

Machine Learning · Computer Science 2022-10-25 Andrea Tirinzoni , Rémy Degenne

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by…

Machine Learning · Computer Science 2022-01-20 Louis Faury , Marc Abeille , Kwang-Sung Jun , Clément Calauzènes

Efficient Contextual Bandits with Uninformed Feedback Graphs

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual…

Machine Learning · Computer Science 2024-02-14 Mengxiao Zhang , Yuheng Zhang , Haipeng Luo , Paul Mineiro

Designing an Interpretable Interface for Contextual Bandits

Contextual bandits have become an increasingly popular solution for personalized recommender systems. Despite their growing use, the interpretability of these systems remains a significant challenge, particularly for the often non-expert…

Machine Learning · Computer Science 2024-09-24 Andrew Maher , Matia Gobbo , Lancelot Lachartre , Subash Prabanantham , Rowan Swiers , Puli Liyanagama

Adapting to Misspecification in Contextual Bandits

A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical…

Machine Learning · Computer Science 2021-07-14 Dylan J. Foster , Claudio Gentile , Mehryar Mohri , Julian Zimmert

Stochastic bandits robust to adversarial corruptions

We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm,…

Machine Learning · Computer Science 2018-03-28 Thodoris Lykouris , Vahab Mirrokni , Renato Paes Leme

A Classification View on Meta Learning Bandits

Contextual multi-armed bandits are a popular choice to model sequential decision-making. E.g., in a healthcare application we may perform various tests to asses a patient condition (exploration) and then decide on the best treatment to give…

Machine Learning · Computer Science 2025-04-08 Mirco Mutti , Jeongyeol Kwon , Shie Mannor , Aviv Tamar

A Contextual Bandit Bake-off

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these…

Machine Learning · Statistics 2021-06-08 Alberto Bietti , Alekh Agarwal , John Langford

Contextual Bandits with Stage-wise Constraints

We study contextual bandits in the presence of a stage-wise constraint when the constraint must be satisfied both with high probability and in expectation. We start with the linear case where both the reward function and the stage-wise…

Machine Learning · Computer Science 2025-08-22 Aldo Pacchiano , Mohammad Ghavamzadeh , Peter Bartlett

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Speculative decoding has emerged as a popular method to accelerate the inference of Large Language Models (LLMs) while retaining their superior text generation performance. Previous methods either adopt a fixed speculative decoding…

Machine Learning · Computer Science 2025-11-21 Yunlong Hou , Fengzhuo Zhang , Cunxiao Du , Xuan Zhang , Jiachun Pan , Tianyu Pang , Chao Du , Vincent Y. F. Tan , Zhuoran Yang

Interpretable by Design: Learning Predictors by Composing Interpretable Queries

There is a growing concern about typically opaque decision-making with high-performance machine learning algorithms. Providing an explanation of the reasoning process in domain-specific terms can be crucial for adoption in risk-sensitive…

Computer Vision and Pattern Recognition · Computer Science 2022-11-28 Aditya Chattopadhyay , Stewart Slocum , Benjamin D. Haeffele , Rene Vidal , Donald Geman

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement…

Machine Learning · Statistics 2022-02-23 Wenshuo Guo , Kumar Krishna Agrawal , Aditya Grover , Vidya Muthukumar , Ashwin Pananjady

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based control policies for finite action spaces. While the problem is well-studied for bandits with perfectly observed context vectors, little is known about the case of…

Machine Learning · Statistics 2022-02-03 Hongju Park , Mohamad Kazem Shirani Faradonbeh

Efficient Contextual Bandits with Continuous Actions

We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a…

Machine Learning · Computer Science 2020-12-07 Maryam Majzoubi , Chicheng Zhang , Rajan Chari , Akshay Krishnamurthy , John Langford , Aleksandrs Slivkins

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a target policy and asked to estimate the expected reward it will obtain when executed in a…

Machine Learning · Statistics 2024-03-04 Subhojyoti Mukherjee , Qiaomin Xie , Josiah Hanna , Robert Nowak

Practical Bandits: An Industry Perspective

The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms…

Machine Learning · Computer Science 2023-02-03 Bram van den Akker , Olivier Jeunen , Ying Li , Ben London , Zahra Nazari , Devesh Parekh

Algorithm Selection as a Bandit Problem with Unbounded Losses

Algorithm selection is typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance…

Artificial Intelligence · Computer Science 2013-01-31 Matteo Gagliolo , Juergen Schmidhuber

Optimal Multi-Fidelity Best-Arm Identification

In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to…

Machine Learning · Computer Science 2025-05-27 Riccardo Poiani , Rémy Degenne , Emilie Kaufmann , Alberto Maria Metelli , Marcello Restelli