Related papers: Meta-Thompson Sampling

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior…

Machine Learning · Statistics 2024-09-12 Hao Li , Dong Liang , Zheng Xie

No Regrets for Learning the Prior in Bandits

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with. The key idea in ${\tt AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a distribution over its…

Machine Learning · Computer Science 2022-02-28 Soumya Basu , Branislav Kveton , Manzil Zaheer , Csaba Szepesvári

Metalearning Linear Bandits by Prior Update

Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior. In practice, such information is often lacking. This problem is exacerbated in setups with partial information, where a…

Machine Learning · Statistics 2022-08-08 Amit Peleg , Naama Pearl , Ron Meir

Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models

How to explore efficiently is a central problem in multi-armed bandits. In this paper, we introduce the metadata-based multi-task bandit problem, where the agent needs to solve a large number of related multi-armed bandit tasks and can…

Machine Learning · Computer Science 2021-08-17 Runzhe Wan , Lin Ge , Rui Song

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the…

Machine Learning · Computer Science 2022-03-01 Runzhe Wan , Lin Ge , Rui Song

Meta Learning in Bandits within Shared Affine Subspaces

We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low-dimensional affine subspace, which we learn via online principal component analysis to reduce the expected…

Machine Learning · Computer Science 2024-04-02 Steven Bilaj , Sofien Dhouib , Setareh Maghsudi

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains…

Machine Learning · Statistics 2018-02-27 Carlos Riquelme , George Tucker , Jasper Snoek

Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from a distribution that reflects task similarities. We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian…

Machine Learning · Computer Science 2022-03-08 Joey Hong , Branislav Kveton , Manzil Zaheer , Mohammad Ghavamzadeh

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an…

Machine Learning · Computer Science 2020-01-09 Daniel Russo , Benjamin Van Roy

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

The literature on bandit learning and regret analysis has focused on contexts where the goal is to converge on an optimal action in a manner that limits exploration costs. One shortcoming imposed by this orientation is that it does not…

Machine Learning · Computer Science 2017-05-01 Daniel Russo , David Tse , Benjamin Van Roy

Non-Stationary Bandit Learning via Predictive Sampling

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the…

Machine Learning · Computer Science 2025-05-06 Yueyang Liu , Xu Kuang , Benjamin Van Roy

Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its…

Machine Learning · Computer Science 2023-07-06 Mohammadjavad Azizi , Branislav Kveton , Mohammad Ghavamzadeh , Sumeet Katariya

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We…

Machine Learning · Computer Science 2022-03-08 Joey Hong , Branislav Kveton , Manzil Zaheer , Mohammad Ghavamzadeh , Craig Boutilier

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

Machine Learning · Computer Science 2014-02-04 Daniel Russo , Benjamin Van Roy

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven…

Machine Learning · Statistics 2021-11-30 Hongju Park , Mohamad Kazem Shirani Faradonbeh

Double-Linear Thompson Sampling for Context-Attentive Bandits

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset…

Machine Learning · Computer Science 2020-10-20 Djallel Bouneffouf , Raphaël Féraud , Sohini Upadhyay , Yasaman Khazaeni , Irina Rish

Thompson Sampling for Bandit Learning in Matching Markets

The problem of two-sided matching markets has a wide range of real-world applications and has been extensively studied in the literature. A line of recent works have focused on the problem setting where the preferences of one-side market…

Machine Learning · Computer Science 2022-05-03 Fang Kong , Junming Yin , Shuai Li

Mixed-Effect Thompson Sampling

A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such…

Machine Learning · Computer Science 2023-03-07 Imad Aouali , Branislav Kveton , Sumeet Katariya

Generalized Thompson Sampling for Contextual Bandits

Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this…

Machine Learning · Computer Science 2013-10-29 Lihong Li

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu