Related papers: Parallelizing Thompson Sampling

Thompson Sampling with Virtual Helping Agents

We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the…

Machine Learning · Computer Science 2022-09-20 Kartik Anand Pant , Amod Hegde , K. V. Srinivas

Batched Thompson Sampling

We introduce a novel anytime Batched Thompson sampling policy for multi-armed bandits where the agent observes the rewards of her actions and adjusts her policy only at the end of a small number of batches. We show that this policy…

Machine Learning · Computer Science 2021-10-04 Cem Kalkanli , Ayfer Ozgur

Analysis of Thompson Sampling for the multi-armed bandit problem

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W.…

Machine Learning · Computer Science 2012-04-10 Shipra Agrawal , Navin Goyal

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Batched Thompson Sampling for Multi-Armed Bandits

We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two…

Machine Learning · Computer Science 2021-08-17 Nikolai Karpov , Qin Zhang

Thompson Sampling in Partially Observable Contextual Bandits

Contextual bandits constitute a classical framework for decision-making under uncertainty. In this setting, the goal is to learn the arms of highest reward subject to contextual information, while the unknown reward parameters of each arm…

Machine Learning · Statistics 2024-02-19 Hongju Park , Mohamad Kazem Shirani Faradonbeh

A Broader View of Thompson Sampling

Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms,…

Machine Learning · Computer Science 2026-05-28 Yanlin Qu , Hongseok Namkoong , Assaf Zeevi

Thompson Sampling on Symmetric $\alpha$-Stable Bandits

Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards…

Machine Learning · Computer Science 2019-12-09 Abhimanyu Dubey , Alex Pentland

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven…

Machine Learning · Statistics 2021-11-30 Hongju Park , Mohamad Kazem Shirani Faradonbeh

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary…

Machine Learning · Computer Science 2022-05-24 Han Wu , Stefan Wager

Thompson Sampling for a Fatigue-aware Online Recommendation System

In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to…

Machine Learning · Computer Science 2019-04-16 Yunjuan Wang , Theja Tulabandhula

Thompson Sampling on Asymmetric $\alpha$-Stable Bandits

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to…

Machine Learning · Statistics 2022-03-28 Zhendong Shi , Ercan E. Kuruoglu , Xiaoli Wei

A Tutorial on Thompson Sampling

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information…

Machine Learning · Computer Science 2020-07-16 Daniel Russo , Benjamin Van Roy , Abbas Kazerouni , Ian Osband , Zheng Wen

Thompson Sampling for the MNL-Bandit

We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of…

Machine Learning · Computer Science 2019-01-07 Shipra Agrawal , Vashist Avadhanula , Vineet Goyal , Assaf Zeevi

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical results for TS rely on restrictive assumptions on reward…

Machine Learning · Computer Science 2023-06-16 Amin Karbasi , Nikki Lijing Kuang , Yi-An Ma , Siddharth Mitra

Thompson Sampling with Information Relaxation Penalties

We consider a finite-horizon multi-armed bandit (MAB) problem in a Bayesian setting, for which we propose an information relaxation sampling framework. With this framework, we define an intuitive family of control policies that include…

Machine Learning · Computer Science 2021-06-17 Seungki Min , Costis Maglaras , Ciamac C. Moallemi

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$,…

Machine Learning · Computer Science 2024-05-03 Bingshan Hu , Zhiming Huang , Tianyue H. Zhang , Mathias Lécuyer , Nidhi Hegde

Generalized Thompson Sampling for Contextual Bandits

Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this…

Machine Learning · Computer Science 2013-10-29 Lihong Li