Related papers: Thompson Sampling with Diffusion Generative Prior

Diffusion Models Meet Contextual Bandits

Efficient online decision-making in contextual bandits is challenging, as methods without informative priors often suffer from computational or statistical inefficiencies. In this work, we leverage pre-trained diffusion models as expressive…

Machine Learning · Computer Science 2025-10-29 Imad Aouali

Meta-Thompson Sampling

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns…

Machine Learning · Computer Science 2021-06-24 Branislav Kveton , Mikhail Konobeev , Manzil Zaheer , Chih-wei Hsu , Martin Mladenov , Craig Boutilier , Csaba Szepesvari

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Thompson sampling (TS) has emerged as a robust technique for contextual bandit problems. However, TS requires posterior inference and optimization for action generation, prohibiting its use in many online platforms where latency and ease of…

Machine Learning · Computer Science 2024-07-23 Hongseok Namkoong , Samuel Daulton , Eytan Bakshy

Non-Stationary Bandit Learning via Predictive Sampling

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the…

Machine Learning · Computer Science 2025-05-06 Yueyang Liu , Xu Kuang , Benjamin Van Roy

Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning

Diffusion models generate samples through an iterative denoising process, guided by a neural network. While training the denoiser on real-world data is computationally demanding, the sampling procedure itself is more flexible. This…

Machine Learning · Computer Science 2026-02-10 Constant Bourdrez , Alexandre Vérine , Olivier Cappé

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an…

Machine Learning · Computer Science 2020-01-09 Daniel Russo , Benjamin Van Roy

Thompson Sampling for Robust Transfer in Multi-Task Bandits

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across…

Machine Learning · Computer Science 2022-06-20 Zhi Wang , Chicheng Zhang , Kamalika Chaudhuri

Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter…

Machine Learning · Statistics 2023-01-31 Sunrit Chakraborty , Saptarshi Roy , Ambuj Tewari

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many…

Machine Learning · Computer Science 2020-07-10 Tong Yu , Branislav Kveton , Zheng Wen , Ruiyi Zhang , Ole J. Mengshoel

Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling

As two popular schools of machine learning, online learning and evolutionary computations have become two important driving forces behind real-world decision making engines for applications in biomedicine, economics, and engineering fields.…

Neural and Evolutionary Computing · Computer Science 2022-05-24 Baihan Lin

Diffusion Model for Generative Image Denoising

In supervised learning for image denoising, usually the paired clean images and noisy images are collected or synthesised to train a denoising model. L2 norm loss or other distance functions are used as the objective function for training.…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Yutong Xie , Minne Yuan , Bin Dong , Quanzheng Li

Online Posterior Sampling with a Diffusion Prior

Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In…

Machine Learning · Computer Science 2026-02-17 Branislav Kveton , Boris Oreshkin , Youngsuk Park , Aniket Deshmukh , Rui Song

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven…

Machine Learning · Statistics 2021-11-30 Hongju Park , Mohamad Kazem Shirani Faradonbeh

A Tutorial on Thompson Sampling

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information…

Machine Learning · Computer Science 2020-07-16 Daniel Russo , Benjamin Van Roy , Abbas Kazerouni , Ian Osband , Zheng Wen

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains…

Machine Learning · Statistics 2018-02-27 Carlos Riquelme , George Tucker , Jasper Snoek

Generative Modeling with Diffusion

We provide an overview of the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in…

Machine Learning · Statistics 2025-06-13 Justin Le

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the…

Machine Learning · Computer Science 2022-03-01 Runzhe Wan , Lin Ge , Rui Song

Meta Learning in Bandits within Shared Affine Subspaces

We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low-dimensional affine subspace, which we learn via online principal component analysis to reduce the expected…

Machine Learning · Computer Science 2024-04-02 Steven Bilaj , Sofien Dhouib , Setareh Maghsudi

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We…

Machine Learning · Computer Science 2022-03-08 Joey Hong , Branislav Kveton , Manzil Zaheer , Mohammad Ghavamzadeh , Craig Boutilier

Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models

Diffusion and flow-based models have enabled significant progress in generation tasks across various modalities and have recently found applications in predictive learning. However, unlike typical generation tasks that encourage sample…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Yu Zhang , Xingzhuo Guo , Haoran Xu , Jialong Wu , Mingsheng Long