Related papers: Online Learning for Active Cache Synchronization

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

We introduce a framework for decentralized online learning for multi-armed bandits (MAB) with multiple cooperative players. The reward obtained by the players in each round depends on the actions taken by all the players. It's a team…

Machine Learning · Computer Science 2021-09-10 William Chang , Mehdi Jafarnia-Jahromi , Rahul Jain

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number ($k$) of choices has better reward (or loss) before making its…

Data Structures and Algorithms · Computer Science 2022-11-08 Aditya Bhaskara , Sreenivas Gollapudi , Sungjin Im , Kostas Kollias , Kamesh Munagala

SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed bandit problem, where several players pull arms simultaneously and collisions occur if one of them is pulled by several players at the same stage. We…

Machine Learning · Computer Science 2019-11-20 Etienne Boursier , Vianney Perchet

Combinatorial Allocation Bandits with Nonlinear Arm Utility

A matching platform is a system that matches different types of participants, such as companies and job-seekers. In such a platform, merely maximizing the number of matches can result in matches being concentrated on highly popular…

Machine Learning · Computer Science 2026-03-10 Yuki Shibukawa , Koichi Tanaka , Yuta Saito , Shinji Ito

Multi-armed Bandits with Cost Subsidy

In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing…

Machine Learning · Computer Science 2021-03-16 Deeksha Sinha , Karthik Abinav Sankararama , Abbas Kazerouni , Vashist Avadhanula

Contextual Bandits with Similarity Information

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small…

Data Structures and Algorithms · Computer Science 2014-05-21 Aleksandrs Slivkins

Decentralized Asynchronous Multi-player Bandits

In recent years, multi-player multi-armed bandits (MP-MAB) have been extensively studied due to their wide applications in cognitive radio networks and Internet of Things systems. While most existing research on MP-MAB focuses on…

Machine Learning · Computer Science 2025-10-01 Jingqi Fan , Canzhe Zhao , Shuai Li , Siwei Wang

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested…

Machine Learning · Computer Science 2022-12-08 Alberto Maria Metelli , Francesco Trovò , Matteo Pirola , Marcello Restelli

Corralling a Band of Bandit Algorithms

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be…

Machine Learning · Computer Science 2017-06-07 Alekh Agarwal , Haipeng Luo , Behnam Neyshabur , Robert E. Schapire

Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications

Multi-player multi-armed bandits (MMAB) study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the…

Machine Learning · Computer Science 2022-04-29 Xuchuang Wang , Hong Xie , John C. S. Lui

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards…

Machine Learning · Computer Science 2020-12-16 Siwei Wang , Haoyun Wang , Longbo Huang

Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms

Multi-armed bandit (MAB) algorithms are efficient approaches to reduce the opportunity cost of online experimentation and are used by companies to find the best product from periodically refreshed product catalogs. However, these algorithms…

Machine Learning · Computer Science 2024-12-19 Mohsen Bayati , Junyu Cao , Wanning Chen

Incentivized Bandit Learning with Self-Reinforcing User Preferences

In this paper, we investigate a new multi-armed bandit (MAB) online learning model that considers real-world phenomena in many recommender systems: (i) the learning agent cannot pull the arms by itself and thus has to offer rewards to users…

Machine Learning · Computer Science 2021-06-01 Tianchen Zhou , Jia Liu , Chaosheng Dong , Jingyuan Deng

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where…

Machine Learning · Statistics 2019-01-25 Yang Cao , Zheng Wen , Branislav Kveton , Yao Xie

On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

We study the multi-player stochastic multiarmed bandit (MAB) problem in an abruptly changing environment. We consider a collision model in which a player receives reward at an arm if it is the only player to select the arm. We design two…

Machine Learning · Statistics 2018-12-14 Lai Wei , Vaibhav Srivastava

Multi-armed Bandit Requiring Monotone Arm Sequences

In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early…

Machine Learning · Computer Science 2021-10-08 Ningyuan Chen

Bandit Learning in Decentralized Matching Markets

We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume…

Machine Learning · Computer Science 2021-06-23 Lydia T. Liu , Feng Ruan , Horia Mania , Michael I. Jordan

Global Bandits

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

Combinatorial Bandits under Strategic Manipulations

Strategic behavior against sequential learning methods, such as "click framing" in real recommendation systems, have been widely observed. Motivated by such behavior we study the problem of combinatorial multi-armed bandits (CMAB) under…

Machine Learning · Computer Science 2021-11-22 Jing Dong , Ke Li , Shuai Li , Baoxiang Wang

Multi-Armed Bandits with Dependent Arms

We study a variant of the classical multi-armed bandit problem (MABP) which we call as Multi-Armed Bandits with dependent arms. More specifically, multiple arms are grouped together to form a cluster, and the reward distributions of arms…

Machine Learning · Computer Science 2020-10-27 Rahul Singh , Fang Liu , Yin Sun , Ness Shroff