English
Related papers

Related papers: Constrained Linear Thompson Sampling

200 papers

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under…

Machine Learning · Computer Science 2020-03-03 Ahmadreza Moradipari , Sanae Amani , Mahnoosh Alizadeh , Christos Thrampoulidis

We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the…

Machine Learning · Computer Science 2020-10-02 Ahmadreza Moradipari , Christos Thrampoulidis , Mahnoosh Alizadeh

The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown roundwise constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs…

Machine Learning · Computer Science 2024-07-02 Aditya Gangrade , Tianrui Chen , Venkatesh Saligrama

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson…

Machine Learning · Computer Science 2020-05-14 Vidit Saxena , Joseph E. Gonzalez , Joakim Jaldén

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in $\mathbb{R}^d$ and receives noisy rewards. The objective is to minimize regret, the difference…

Machine Learning · Computer Science 2023-04-24 Nima Hamidi , Mohsen Bayati

This paper presents a new dynamic approach to experiment design in settings where, due to interference or other concerns, experimental units are coarse. `Region-split' experiments on online platforms are one example of such a setting. The…

Machine Learning · Statistics 2022-02-16 Vivek Farias , Ciamac Moallemi , Tianyi Peng , Andrew Zheng

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\widetilde{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on…

Machine Learning · Statistics 2019-11-06 Marc Abeille , Alessandro Lazaric

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper,…

Machine Learning · Computer Science 2026-04-16 Tomas Kocak , Michal Valko , Remi Munos , Shipra Agrawal

This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the…

Machine Learning · Computer Science 2026-05-12 Zhengmiao Wang , Ming Chi , Zhi-Wei Liu , Lintao Ye , Carla Fabiana Chiasserini

We study the logistic bandit, in which rewards are binary with success probability $\exp(\beta a^\top \theta) / (1 + \exp(\beta a^\top \theta))$ and actions $a$ and coefficients $\theta$ are within the $d$-dimensional unit ball. While prior…

Machine Learning · Statistics 2019-05-14 Shi Dong , Tengyu Ma , Benjamin Van Roy

Stochastic rising rested bandit (SRRB) is a setting where the arms' expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process…

Machine Learning · Statistics 2025-05-21 Marco Fiandri , Alberto Maria Metelli , Francesco Trovò

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion…

Machine Learning · Statistics 2024-03-07 Amaury Gouverneur , Borja Rodríguez-Gálvez , Tobias J. Oechtering , Mikael Skoglund

In this study, we explore a collaborative multi-agent stochastic linear bandit setting involving a network of $N$ agents that communicate locally to minimize their collective regret while keeping their expected cost under a specified…

Machine Learning · Computer Science 2024-10-24 Amirhossein Afsharrad , Parisa Oftadeh , Ahmadreza Moradipari , Sanjay Lall

We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter…

Machine Learning · Statistics 2023-01-31 Sunrit Chakraborty , Saptarshi Roy , Ambuj Tewari

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset…

Machine Learning · Computer Science 2020-10-20 Djallel Bouneffouf , Raphaël Féraud , Sohini Upadhyay , Yasaman Khazaeni , Irina Rish

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2012-09-18 Shipra Agrawal , Navin Goyal

Multi-armed bandits (MAB) are extensively studied in various settings where the objective is to \textit{maximize} the actions' outcomes (i.e., rewards) over time. Since safety is crucial in many real-world problems, safe versions of MAB…

Machine Learning · Computer Science 2021-12-14 Ilker Demirel , Mehmet Ufuk Ozdemir , Cem Tekin

A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore…

Machine Learning · Computer Science 2025-11-06 Fengxu Li , Stephanie M. Carpenter , Matthew P. Buman , Yonatan Mintz
‹ Prev 1 2 3 10 Next ›