Related papers: Predictive Bandits

Preselection Bandits

In this paper, we introduce the Preselection Bandit problem, in which the learner preselects a subset of arms (choice alternatives) for a user, which then chooses the final arm from this subset. The learner is not aware of the user's…

Machine Learning · Computer Science 2021-12-23 Viktor Bengs , Eyke Hüllermeier

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback

Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy human preference feedback over the selected arms for the past contexts. However,…

Machine Learning · Computer Science 2025-04-17 Arun Verma , Zhongxiang Dai , Xiaoqiang Lin , Patrick Jaillet , Bryan Kian Hsiang Low

From Finite to Countable-Armed Bandits

We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward. In addition, there is a fixed distribution over types which sets the proportion of each type…

Machine Learning · Computer Science 2021-05-25 Anand Kalvit , Assaf Zeevi

Robustness Guarantees for Mode Estimation with an Application to Bandits

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this…

Machine Learning · Computer Science 2020-03-09 Aldo Pacchiano , Heinrich Jiang , Michael I. Jordan

Adaptive Bandit Algorithms for Contextual Matching Markets

We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm…

Machine Learning · Computer Science 2026-05-28 Shiyun Lin , Simon Mauras , Vianney Perchet , Nadav Merlis

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number ($k$) of choices has better reward (or loss) before making its…

Data Structures and Algorithms · Computer Science 2022-11-08 Aditya Bhaskara , Sreenivas Gollapudi , Sungjin Im , Kostas Kollias , Kamesh Munagala

Recovering Bandits

We study the recovering bandits problem, a variant of the stochastic multi-armed bandit problem where the expected reward of each arm varies according to some unknown function of the time since the arm was last played. While being a natural…

Machine Learning · Statistics 2019-11-01 Ciara Pike-Burke , Steffen Grünewälder

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of…

Machine Learning · Computer Science 2022-04-13 Evrard Garcelon , Vashist Avadhanula , Alessandro Lazaric , Matteo Pirotta

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the…

Machine Learning · Computer Science 2020-05-26 Andrea Tirinzoni , Alessandro Lazaric , Marcello Restelli

Global Bandits

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

Stochastic Contextual Bandits with Known Reward Functions

Many sequential decision-making problems in communication networks can be modeled as contextual bandit problems, which are natural extensions of the well-known multi-armed bandit problem. In contextual bandit problems, at each time, an…

Machine Learning · Computer Science 2016-05-10 Pranav Sakulkar , Bhaskar Krishnamachari

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits

The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the…

Machine Learning · Computer Science 2023-06-13 Bo Li , Chi Ho Yeung

Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations…

Machine Learning · Computer Science 2015-07-20 Aleksandr Vorobev , Gleb Gusev

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear…

Machine Learning · Computer Science 2024-06-04 Tianyuan Jin , Kyoungseok Jang , Nicolò Cesa-Bianchi

Algorithms for Linear Bandits on Polyhedral Sets

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

Restless Bandit Problem with Rewards Generated by a Linear Gaussian Dynamical System

Decision-making under uncertainty is a fundamental problem encountered frequently and can be formulated as a stochastic multi-armed bandit problem. In the problem, the learner interacts with an environment by choosing an action at each…

Machine Learning · Statistics 2024-05-24 Jonathan Gornet , Bruno Sinopoli

Influential Bandits: Pulling an Arm May Change the Environment

While classical formulations of multi-armed bandit problems assume that each arm's reward is independent and stationary, real-world applications often involve non-stationary environments and interdependencies between arms. In particular,…

Machine Learning · Computer Science 2025-06-19 Ryoma Sato , Shinji Ito

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai