English
Related papers

Related papers: A Tutorial on Thompson Sampling

200 papers

We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the…

Machine Learning · Computer Science 2022-09-20 Kartik Anand Pant , Amod Hegde , K. V. Srinivas

Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms,…

Machine Learning · Computer Science 2026-05-28 Yanlin Qu , Hongseok Namkoong , Assaf Zeevi

Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to address the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural…

Machine Learning · Statistics 2020-08-07 Zhendong Wang , Mingyuan Zhou

Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for…

Machine Learning · Statistics 2023-04-26 Xiuyuan Lu , Benjamin Van Roy

In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to…

Machine Learning · Computer Science 2019-04-16 Yunjuan Wang , Theja Tulabandhula

Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards…

Machine Learning · Computer Science 2019-12-09 Abhimanyu Dubey , Alex Pentland

Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with…

Machine Learning · Computer Science 2019-11-05 Samuel Daulton , Shaun Singh , Vashist Avadhanula , Drew Dimmery , Eytan Bakshy

Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling…

Machine Learning · Computer Science 2014-10-16 Dean Eckles , Maurits Kaptein

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven…

Machine Learning · Statistics 2021-11-30 Hongju Park , Mohamad Kazem Shirani Faradonbeh

We present a novel extension of Thompson Sampling for stochastic sequential decision problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of…

Machine Learning · Computer Science 2017-01-17 Aristide C. Y. Tossou , Christos Dimitrakakis , Devdatt Dubhashi

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often…

Machine Learning · Computer Science 2020-01-16 My Phan , Yasin Abbasi-Yadkori , Justin Domke

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across…

Machine Learning · Computer Science 2022-06-20 Zhi Wang , Chicheng Zhang , Kamalika Chaudhuri

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to…

Machine Learning · Statistics 2022-03-28 Zhendong Shi , Ercan E. Kuruoglu , Xiaoli Wei

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu

Thompson sampling, a Bayesian method for balancing exploration and exploitation in bandit problems, has theoretical guarantees and exhibits strong empirical performance in many domains. Traditional Thompson sampling, however, assumes…

Machine Learning · Computer Science 2018-12-04 Andrew Stirn , Tony Jebara

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad…

Machine Learning · Computer Science 2020-11-10 Cem Kalkanli , Ayfer Ozgur

Thompson sampling (TS) is a Bayesian randomized exploration strategy that samples options (e.g., system parameters or control laws) from the current posterior and then applies the selected option that is optimal for a task, thereby…

Machine Learning · Computer Science 2026-02-06 Kaikai Zheng , Dawei Shi , Yang Shi , Long Wang

Pursuit-evasion is a multi-agent sequential decision problem wherein a group of agents known as pursuers coordinate their traversal of a spatial domain to locate an agent trying to evade them. Pursuit evasion problems arise in a number of…

Machine Learning · Computer Science 2018-11-13 Zhen Li , Nicholas J. Meyer , Eric B. Laber , Robert Brigantic

There is increasing interest in using streaming data to inform decision making across a wide range of application domains including mobile health, food safety, security, and resource management. A decision support system formalizes online…

Methodology · Statistics 2019-05-14 Tao Hu , Eric B. Laber , Zhen Li , Nick J. Meyer , Krishna Pacifici

How can we make use of information parallelism in online decision making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online…

Machine Learning · Computer Science 2021-06-04 Amin Karbasi , Vahab Mirrokni , Mohammad Shadravan
‹ Prev 1 2 3 10 Next ›