Related papers: Constrained Thompson Sampling for Wireless Link Op…

Joint Link Rate Selection and Channel State Change Detection in Block-Fading Channels

In this work, we consider the problem of transmission rate selection for a discrete time point-to-point block fading wireless communication link. The wireless channel remains constant within the channel coherence time but can change rapidly…

Networking and Internet Architecture · Computer Science 2021-08-24 Haoyue Tang , Xinyu Hou , Jintao Wang , Jian Song

Constrained Thompson Sampling for Real-Time Electricity Pricing with Grid Reliability Constraints

We consider the problem of an aggregator attempting to learn customers' load flexibility models while implementing a load shaping program by means of broadcasting daily dispatch signals. We adopt a multi-armed bandit formulation to account…

Systems and Control · Electrical Eng. & Systems 2020-06-19 Nathaniel Tucker , Ahmadreza Moradipari , Mahnoosh Alizadeh

Thompson Sampling for Linearly Constrained Bandits

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson…

Machine Learning · Computer Science 2020-05-14 Vidit Saxena , Joseph E. Gonzalez , Joakim Jaldén

Reinforcement Learning for Efficient and Tuning-Free Link Adaptation

Wireless links adapt the data transmission parameters to the dynamic channel state -- this is called link adaptation. Classical link adaptation relies on tuning parameters that are challenging to configure for optimal link performance.…

Signal Processing · Electrical Eng. & Systems 2021-05-06 Vidit Saxena , Hugo Tullberg , Joakim Jaldén

Safe Linear Thompson Sampling with Side Information

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under…

Machine Learning · Computer Science 2020-03-03 Ahmadreza Moradipari , Sanae Amani , Mahnoosh Alizadeh , Christos Thrampoulidis

MOTS: Minimax Optimal Thompson Sampling

Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and…

Machine Learning · Computer Science 2020-10-02 Tianyuan Jin , Pan Xu , Jieming Shi , Xiaokui Xiao , Quanquan Gu

Asymptotically Optimal Bandits under Weighted Information

We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a…

Machine Learning · Computer Science 2021-06-01 Matias I. Müller , Cristian R. Rojas

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion…

Machine Learning · Statistics 2024-03-07 Amaury Gouverneur , Borja Rodríguez-Gálvez , Tobias J. Oechtering , Mikael Skoglund

Thompson Sampling with Unrestricted Delays

We investigate properties of Thompson Sampling in the stochastic multi-armed bandit problem with delayed feedback. In a setting with i.i.d delays, we establish to our knowledge the first regret bounds for Thompson Sampling with arbitrary…

Machine Learning · Computer Science 2022-05-24 Han Wu , Stefan Wager

Satisficing in Time-Sensitive Bandit Learning

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an…

Machine Learning · Computer Science 2020-01-09 Daniel Russo , Benjamin Van Roy

Thompson Sampling for Multi-Objective Linear Contextual Bandit

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose \texttt{MOL-TS}, the \textit{first} Thompson Sampling algorithm with Pareto regret…

Machine Learning · Statistics 2025-12-02 Somangchan Park , Heesang Ann , Min-hwan Oh

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical results for TS rely on restrictive assumptions on reward…

Machine Learning · Computer Science 2023-06-16 Amin Karbasi , Nikki Lijing Kuang , Yi-An Ma , Siddharth Mitra

Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits

In recent years, the integration of communication and control systems has gained significant traction in various domains, ranging from autonomous vehicles to industrial automation and beyond. Multi-armed bandit (MAB) algorithms have proven…

Systems and Control · Electrical Eng. & Systems 2024-05-16 Hiba Dakdouk , Mohamed Sana , Mattia Merluzzi

Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

Influence maximization, adaptive routing, and dynamic spectrum allocation all require choosing the right action from a large set of alternatives. Thanks to the advances in combinatorial optimization, these and many similar problems can be…

Machine Learning · Computer Science 2020-12-29 Alihan Hüyük , Cem Tekin

Constrained Linear Thompson Sampling

We study safe linear bandits (SLBs), where an agent selects actions from a convex set to maximize an unknown linear objective subject to unknown linear constraints in each round. Existing methods for SLBs provide strong regret guarantees,…

Machine Learning · Computer Science 2025-06-19 Aditya Gangrade , Venkatesh Saligrama

Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits

We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a budget constraint on the total amount of resources that can be used. Budgeted Thompson…

Machine Learning · Computer Science 2024-08-29 Woojin Jeong , Seungki Min

Order Optimal Regret Bounds for Sharpe Ratio Optimization under Thompson Sampling

In this paper, we study sequential decision-making for maximizing the Sharpe ratio (SR) in a stochastic multi-armed bandit (MAB) setting. Unlike standard bandit formulations that maximize cumulative reward, SR optimization requires…

Machine Learning · Computer Science 2026-04-02 Mohammad Taha Shah , Sabrina Khurshid , Gourab Ghatak

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Joint Model Pruning and Resource Allocation for Wireless Time-triggered Federated Learning

Time-triggered federated learning, in contrast to conventional event-based federated learning, organizes users into tiers based on fixed time intervals. However, this network still faces challenges due to a growing number of devices and…

Machine Learning · Computer Science 2025-05-12 Xinlu Zhang , Yansha Deng , Toktam Mahmoodi

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between…

Machine Learning · Computer Science 2021-09-14 Nadav Merlis , Shie Mannor