Related papers: Constrained Linear Thompson Sampling

Safe Linear Thompson Sampling with Side Information

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under…

Machine Learning · Computer Science 2020-03-03 Ahmadreza Moradipari , Sanae Amani , Mahnoosh Alizadeh , Christos Thrampoulidis

Stage-wise Conservative Linear Bandits

We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the…

Machine Learning · Computer Science 2020-10-02 Ahmadreza Moradipari , Christos Thrampoulidis , Mahnoosh Alizadeh

Safe Linear Bandits over Unknown Polytopes

The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown roundwise constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs…

Machine Learning · Computer Science 2024-07-02 Aditya Gangrade , Tianrui Chen , Venkatesh Saligrama

Thompson Sampling for Linearly Constrained Bandits

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson…

Machine Learning · Computer Science 2020-05-14 Vidit Saxena , Joseph E. Gonzalez , Joakim Jaldén

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

On Frequentist Regret of Linear Thompson Sampling

This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in $\mathbb{R}^d$ and receives noisy rewards. The objective is to minimize regret, the difference…

Machine Learning · Computer Science 2023-04-24 Nima Hamidi , Mohsen Bayati

Synthetically Controlled Bandits

This paper presents a new dynamic approach to experiment design in settings where, due to interference or other concerns, experimental units are coarse. `Region-split' experiments on online platforms are one example of such a setting. The…

Machine Learning · Statistics 2022-02-16 Vivek Farias , Ciamac Moallemi , Tianyi Peng , Andrew Zheng

Linear Thompson Sampling Revisited

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\widetilde{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on…

Machine Learning · Statistics 2019-11-06 Marc Abeille , Alessandro Lazaric

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Spectral Thompson sampling

Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper,…

Machine Learning · Computer Science 2026-04-16 Tomas Kocak , Michal Valko , Remi Munos , Shipra Agrawal

Learning to Sparsify Stochastic Linear Bandits

This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the…

Machine Learning · Computer Science 2026-05-12 Zhengmiao Wang , Ming Chi , Zhi-Wei Liu , Lintao Ye , Carla Fabiana Chiasserini

On the Performance of Thompson Sampling on Logistic Bandits

We study the logistic bandit, in which rewards are binary with success probability $\exp(\beta a^\top \theta) / (1 + \exp(\beta a^\top \theta))$ and actions $a$ and coefficients $\theta$ are within the $d$-dimensional unit ball. While prior…

Machine Learning · Statistics 2019-05-14 Shi Dong , Tengyu Ma , Benjamin Van Roy

Thompson Sampling-like Algorithms for Stochastic Rising Bandits

Stochastic rising rested bandit (SRRB) is a setting where the arms' expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process…

Machine Learning · Statistics 2025-05-21 Marco Fiandri , Alberto Maria Metelli , Francesco Trovò

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion…

Machine Learning · Statistics 2024-03-07 Amaury Gouverneur , Borja Rodríguez-Gálvez , Tobias J. Oechtering , Mikael Skoglund

Cooperative Multi-Agent Constrained Stochastic Linear Bandits

In this study, we explore a collaborative multi-agent stochastic linear bandit setting involving a network of $N$ agents that communicate locally to minimize their collective regret while keeping their expected cost under a specified…

Machine Learning · Computer Science 2024-10-24 Amirhossein Afsharrad , Parisa Oftadeh , Ahmadreza Moradipari , Sanjay Lall

Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits

We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter…

Machine Learning · Statistics 2023-01-31 Sunrit Chakraborty , Saptarshi Roy , Ambuj Tewari

Double-Linear Thompson Sampling for Context-Attentive Bandits

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset…

Machine Learning · Computer Science 2020-10-20 Djallel Bouneffouf , Raphaël Féraud , Sohini Upadhyay , Yasaman Khazaeni , Irina Rish

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2012-09-18 Shipra Agrawal , Navin Goyal

Safe Linear Leveling Bandits

Multi-armed bandits (MAB) are extensively studied in various settings where the objective is to \textit{maximize} the actions' outcomes (i.e., rewards) over time. Since safety is crucial in many real-world problems, safe versions of MAB…

Machine Learning · Computer Science 2021-12-14 Ilker Demirel , Mehmet Ufuk Ozdemir , Cem Tekin

Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics

A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore…

Machine Learning · Computer Science 2025-11-06 Fengxu Li , Stephanie M. Carpenter , Matthew P. Buman , Yonatan Mintz