Related papers: Neural Thompson Sampling

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Analysis of Thompson Sampling for the multi-armed bandit problem

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W.…

Machine Learning · Computer Science 2012-04-10 Shipra Agrawal , Navin Goyal

Generalized Thompson Sampling for Contextual Bandits

Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this…

Machine Learning · Computer Science 2013-10-29 Lihong Li

Thompson Sampling on Symmetric $\alpha$-Stable Bandits

Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards…

Machine Learning · Computer Science 2019-12-09 Abhimanyu Dubey , Alex Pentland

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$,…

Machine Learning · Computer Science 2024-05-03 Bingshan Hu , Zhiming Huang , Tianyue H. Zhang , Mathias Lécuyer , Nidhi Hegde

Thompson Sampling for Combinatorial Semi-Bandits

In this paper, we study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We first analyze the standard TS algorithm for the general CMAB model when the outcome…

Machine Learning · Computer Science 2022-06-22 Siwei Wang , Wei Chen

Thompson Sampling for Unimodal Bandits

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from…

Machine Learning · Computer Science 2021-06-17 Long Yang , Zhao Li , Zehong Hu , Shasha Ruan , Shijian Li , Gang Pan , Hongyang Chen

Thompson Sampling for Bandits with Clustered Arms

We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and…

Machine Learning · Computer Science 2022-06-16 Emil Carlsson , Devdatt Dubhashi , Fredrik D. Johansson

Thompson Sampling for Linearly Constrained Bandits

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson…

Machine Learning · Computer Science 2020-05-14 Vidit Saxena , Joseph E. Gonzalez , Joakim Jaldén

A Broader View of Thompson Sampling

Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms,…

Machine Learning · Computer Science 2026-05-28 Yanlin Qu , Hongseok Namkoong , Assaf Zeevi

Thompson Sampling with Virtual Helping Agents

We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the…

Machine Learning · Computer Science 2022-09-20 Kartik Anand Pant , Amod Hegde , K. V. Srinivas

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2012-09-18 Shipra Agrawal , Navin Goyal

Thompson Sampling Algorithms for Cascading Bandits

Motivated by the pressing need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015). While Thompson sampling (TS) algorithms have been shown to be empirically…

Machine Learning · Computer Science 2021-05-18 Zixin Zhong , Wang Chi Cheung , Vincent Y. F. Tan

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

Optimistic Thompson Sampling for No-Regret Learning in Unknown Games

This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We…

Machine Learning · Computer Science 2024-02-27 Yingru Li , Liangqi Liu , Wenqiang Pu , Hao Liang , Zhi-Quan Luo

Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks

Thompson sampling is one of the most popular learning algorithms for online sequential decision-making problems and has rich real-world applications. However, current Thompson sampling algorithms are limited by the assumption that the…

Machine Learning · Computer Science 2024-10-28 Yinglun Xu , Zhiwei Wang , Gagandeep Singh

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems,…

Machine Learning · Statistics 2021-06-11 Taira Tsuchiya , Junya Honda , Masashi Sugiyama

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

Machine Learning · Computer Science 2014-02-04 Daniel Russo , Benjamin Van Roy

Thompson Sampling in Non-Episodic Restless Bandits

Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging. We study learning algorithms over the unknown reward distributions and prove a…

Machine Learning · Computer Science 2019-10-15 Young Hun Jung , Marc Abeille , Ambuj Tewari

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of…

Machine Learning · Computer Science 2024-11-14 Dorian Baudry , Kazuya Suzuki , Junya Honda