English
Related papers

Related papers: Distributed Thompson Sampling

200 papers

We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently…

Machine Learning · Statistics 2020-10-29 Anusha Lalitha , Andrea Goldsmith

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by…

Machine Learning · Computer Science 2015-05-04 Yingce Xia , Haifang Li , Tao Qin , Nenghai Yu , Tie-Yan Liu

The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$…

Machine Learning · Computer Science 2024-07-04 Ronshee Chawla , Daniel Vial , Sanjay Shakkottai , R. Srikant

We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the…

Machine Learning · Computer Science 2022-09-20 Kartik Anand Pant , Amod Hegde , K. V. Srinivas

We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a…

Machine Learning · Computer Science 2021-06-01 Matias I. Müller , Cristian R. Rojas

In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed…

Machine Learning · Computer Science 2019-11-06 Abishek Sankararaman , Ayalvadi Ganesh , Sanjay Shakkottai

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from…

Machine Learning · Computer Science 2021-06-17 Long Yang , Zhao Li , Zehong Hu , Shasha Ruan , Shijian Li , Gang Pan , Hongyang Chen

We study a decentralized cooperative stochastic multi-armed bandit problem with $K$ arms on a network of $N$ agents. In our model, the reward distribution of each arm is the same for each agent and rewards are drawn independently across…

Machine Learning · Computer Science 2019-10-25 David Martínez-Rubio , Varun Kanade , Patrick Rebeschini

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and…

Machine Learning · Computer Science 2022-06-16 Emil Carlsson , Devdatt Dubhashi , Fredrik D. Johansson

In this paper, we study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze the standard TS algorithm for the general CMAB model when the outcome…

Machine Learning · Computer Science 2022-06-22 Siwei Wang , Wei Chen

We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two…

Machine Learning · Computer Science 2021-08-17 Nikolai Karpov , Qin Zhang

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W.…

Machine Learning · Computer Science 2012-04-10 Shipra Agrawal , Navin Goyal

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2012-09-18 Shipra Agrawal , Navin Goyal

We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each…

Machine Learning · Computer Science 2023-12-18 Amitis Shidani , Sattar Vakili

We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward…

Machine Learning · Computer Science 2022-05-16 Ahmadreza Moradipari , Mohammad Ghavamzadeh , Mahnoosh Alizadeh

Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group…

Machine Learning · Computer Science 2023-08-09 Lin Yang , Xuchuang Wang , Mohammad Hajiesmaili , Lijun Zhang , John C. S. Lui , Don Towsley

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff. However, standard formulations do not take into account {\em risk}. In online decision making systems, risk is a…

Machine Learning · Computer Science 2020-08-04 Qiuyu Zhu , Vincent Y. F. Tan

Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this…

Machine Learning · Computer Science 2013-10-29 Lihong Li

Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other…

Machine Learning · Computer Science 2022-03-02 Daniel Vial , Sanjay Shakkottai , R. Srikant
‹ Prev 1 2 3 10 Next ›