Related papers: Mixed-Effect Thompson Sampling

Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning

Thompson Sampling has been widely used for contextual bandit problems due to the flexibility of its modeling power. However, a general theory for this class of methods in the frequentist setting is still lacking. In this paper, we present a…

Machine Learning · Computer Science 2021-10-05 Tong Zhang

Generalized Thompson Sampling for Contextual Bandits

Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this…

Machine Learning · Computer Science 2013-10-29 Lihong Li

Double-Linear Thompson Sampling for Context-Attentive Bandits

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset…

Machine Learning · Computer Science 2020-10-20 Djallel Bouneffouf , Raphaël Féraud , Sohini Upadhyay , Yasaman Khazaeni , Irina Rish

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We…

Machine Learning · Computer Science 2022-03-08 Joey Hong , Branislav Kveton , Manzil Zaheer , Mohammad Ghavamzadeh , Craig Boutilier

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven…

Machine Learning · Statistics 2021-11-30 Hongju Park , Mohamad Kazem Shirani Faradonbeh

BFTS: Thompson Sampling with Bayesian Additive Regression Trees

Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems,…

Machine Learning · Statistics 2026-02-10 Ruizhe Deng , Bibhas Chakraborty , Ran Chen , Yan Shuo Tan

Multi-Task Learning for Contextual Bandits

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad…

Machine Learning · Statistics 2017-05-25 Aniket Anand Deshmukh , Urun Dogan , Clayton Scott

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior…

Machine Learning · Statistics 2024-09-12 Hao Li , Dong Liang , Zheng Xie

An Analysis of Ensemble Sampling

Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a regret bound that ensures…

Machine Learning · Computer Science 2023-03-02 Chao Qin , Zheng Wen , Xiuyuan Lu , Benjamin Van Roy

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are…

Machine Learning · Computer Science 2019-10-29 Young Hun Jung , Ambuj Tewari

Incorporating Behavioral Constraints in Online AI Systems

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding…

Artificial Intelligence · Computer Science 2018-09-18 Avinash Balakrishnan , Djallel Bouneffouf , Nicholas Mattei , Francesca Rossi

Meta Learning in Bandits within Shared Affine Subspaces

We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low-dimensional affine subspace, which we learn via online principal component analysis to reduce the expected…

Machine Learning · Computer Science 2024-04-02 Steven Bilaj , Sofien Dhouib , Setareh Maghsudi

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

Contextual Thompson Sampling via Generation of Missing Data

We introduce a framework for Thompson sampling (TS) contextual bandit algorithms, in which the algorithm's ability to quantify uncertainty and make decisions depends on the quality of a generative model that is learned offline. Instead of…

Machine Learning · Computer Science 2025-11-13 Kelly W. Zhang , Tiffany Tianhui Cai , Hongseok Namkoong , Daniel Russo

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu

Thompson Sampling for Complex Bandit Problems

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of…

Machine Learning · Statistics 2013-11-05 Aditya Gopalan , Shie Mannor , Yishay Mansour

Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits

In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits…

Machine Learning · Computer Science 2024-04-09 Yue Kang , Cho-Jui Hsieh , Thomas C. M. Lee

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many…

Machine Learning · Computer Science 2020-07-10 Tong Yu , Branislav Kveton , Zheng Wen , Ruiyi Zhang , Ole J. Mengshoel

Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

In this work, we study the performance of the Thompson Sampling algorithm for Contextual Bandit problems based on the framework introduced by Neu et al. and their concept of lifted information ratio. First, we prove a comprehensive bound on…

Machine Learning · Statistics 2023-04-27 Amaury Gouverneur , Borja Rodríguez-Gálvez , Tobias J. Oechtering , Mikael Skoglund

Meta-Thompson Sampling

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns…

Machine Learning · Computer Science 2021-06-24 Branislav Kveton , Mikhail Konobeev , Manzil Zaheer , Chih-wei Hsu , Martin Mladenov , Craig Boutilier , Csaba Szepesvari