Related papers: Thompson Sampling for Linear-Quadratic Control Pro…

Linear Thompson Sampling Revisited

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\widetilde{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on…

Machine Learning · Statistics 2019-11-06 Marc Abeille , Alessandro Lazaric

Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of…

Machine Learning · Computer Science 2022-06-20 Taylan Kargin , Sahin Lale , Kamyar Azizzadenesheli , Anima Anandkumar , Babak Hassibi

On Frequentist Regret of Linear Thompson Sampling

This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in $\mathbb{R}^d$ and receives noisy rewards. The objective is to minimize regret, the difference…

Machine Learning · Computer Science 2023-04-24 Nima Hamidi , Mohsen Bayati

Thompson Sampling for Complex Bandit Problems

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of…

Machine Learning · Statistics 2013-11-05 Aditya Gopalan , Shie Mannor , Yishay Mansour

Thompson Sampling for Linearly Constrained Bandits

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson…

Machine Learning · Computer Science 2020-05-14 Vidit Saxena , Joseph E. Gonzalez , Joakim Jaldén

Rate-Optimal Regret for the Safe Learning-based Control of the Constrained Linear Quadratic Regulator

We study the problem of adaptive control of the stochastic linear quadratic regulator (LQR) with constraints that must be satisfied at every time step. Prior work on the multidimensional problem has shown $\tilde{O}(T^{2/3})$ regret and…

Optimization and Control · Mathematics 2026-05-08 Spencer Hutchinson , Nanfei Jiang , Mahnoosh Alizadeh

Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using…

Systems and Control · Electrical Eng. & Systems 2020-11-11 Mukul Gagrani , Sagar Sudhakara , Aditya Mahajan , Ashutosh Nayyar , Yi Ouyang

Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors

Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $\alpha$-TS, where we use a fractional or $\alpha$-posterior ($\alpha\in(0,1)$)…

Machine Learning · Statistics 2023-09-13 Prateek Jaiswal , Debdeep Pati , Anirban Bhattacharya , Bani K. Mallick

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical results for TS rely on restrictive assumptions on reward…

Machine Learning · Computer Science 2023-06-16 Amin Karbasi , Nikki Lijing Kuang , Yi-An Ma , Siddharth Mitra

Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning

Thompson Sampling has been widely used for contextual bandit problems due to the flexibility of its modeling power. However, a general theory for this class of methods in the frequentist setting is still lacking. In this paper, we present a…

Machine Learning · Computer Science 2021-10-05 Tong Zhang

Analysis of Thompson Sampling for the multi-armed bandit problem

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W.…

Machine Learning · Computer Science 2012-04-10 Shipra Agrawal , Navin Goyal

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a…

Machine Learning · Statistics 2025-05-30 Yeoneung Kim , Gihun Kim , Jiwhan Park , Insoon Yang

Safe Linear Thompson Sampling with Side Information

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under…

Machine Learning · Computer Science 2020-03-03 Ahmadreza Moradipari , Sanae Amani , Mahnoosh Alizadeh , Christos Thrampoulidis

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling…

Machine Learning · Statistics 2021-11-01 Brendan O'Donoghue , Tor Lattimore

Thompson Sampling in Non-Episodic Restless Bandits

Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging. We study learning algorithms over the unknown reward distributions and prove a…

Machine Learning · Computer Science 2019-10-15 Young Hun Jung , Marc Abeille , Ambuj Tewari

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$,…

Machine Learning · Computer Science 2024-05-03 Bingshan Hu , Zhiming Huang , Tianyue H. Zhang , Mathias Lécuyer , Nidhi Hegde

Thompson Sampling Algorithms for Mean-Variance Bandits

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff. However, standard formulations do not take into account {\em risk}. In online decision making systems, risk is a…

Machine Learning · Computer Science 2020-08-04 Qiuyu Zhu , Vincent Y. F. Tan

Neural Thompson Sampling

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu