Related papers: A Linear Programming Relaxation and a Heuristic fo…

Restless Linear Bandits

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}^d$-valued stationary $\varphi$-mixing sequence of parameters…

Machine Learning · Statistics 2024-05-20 Azadeh Khaleghi

Bandit Linear Control

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision,…

Machine Learning · Computer Science 2020-07-03 Asaf Cassel , Tomer Koren

Lagrangian Relaxation for Multi-Action Partially Observable Restless Bandits: Heuristic Policies and Indexability

Partially observable restless multi-armed bandits have found numerous applications including in recommendation systems, communication systems, public healthcare outreach systems, and in operations research. We study multi-action partially…

Machine Learning · Computer Science 2025-09-03 Rahul Meshram , Kesav Kaza

A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits

Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward…

Machine Learning · Computer Science 2022-03-08 Pierre Laforgue , Giulia Clerici , Nicolò Cesa-Bianchi , Ran Gilad-Bachrach

Optimal Policies for Observing Time Series and Related Restless Bandit Problems

The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly…

Machine Learning · Statistics 2017-03-30 Christopher R. Dance , Tomi Silander

Revisiting Weighted Strategy for Non-stationary Parametric Bandits

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit…

Machine Learning · Computer Science 2023-06-08 Jing Wang , Peng Zhao , Zhi-Hua Zhou

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized…

Machine Learning · Computer Science 2020-10-09 Yu-Heng Hung , Ping-Chun Hsieh , Xi Liu , P. R. Kumar

Continuous-Time Multi-Armed Bandits with Controlled Restarts

Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science. Recently, restart strategies have gained significant attention for boosting the efficiency of…

Machine Learning · Computer Science 2020-07-02 Semih Cayci , Atilla Eryilmaz , R. Srikant

Phase Transitions in Bandits with Switching Constraints

We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget. For this problem, we prove matching upper and…

Machine Learning · Computer Science 2021-03-22 David Simchi-Levi , Yunzong Xu

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.…

Optimization and Control · Mathematics 2015-01-30 Cem Tekin , Mingyan Liu

Energy Regularized RNNs for Solving Non-Stationary Bandit Problems

We consider a Multi-Armed Bandit problem in which the rewards are non-stationary and are dependent on past actions and potentially on past contexts. At the heart of our method, we employ a recurrent neural network, which models these…

Machine Learning · Computer Science 2023-03-29 Michael Rotman , Lior Wolf

On Slowly-varying Non-stationary Bandits

We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected…

Machine Learning · Computer Science 2021-10-26 Ramakrishnan Krishnamurthy , Aditya Gopalan

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators…

Machine Learning · Computer Science 2026-03-30 Alberto Rumi , Andrew Jacobsen , Nicolò Cesa-Bianchi , Fabio Vitale

Dynamic programming for discrete-time finite horizon optimal switching problems with negative switching costs

This paper studies a discrete-time optimal switching problem on a finite horizon. The underlying model has a running reward, terminal reward and signed (positive and negative) switching costs. Using the martingale approach to optimal…

Optimization and Control · Mathematics 2016-10-17 Randall Martyr

Bandits with Feedback Graphs and Switching Costs

We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a \emph{switching cost} is incurred for shifting to a new action. All previously known…

Machine Learning · Computer Science 2020-03-24 Raman Arora , Teodor V. Marinov , Mehryar Mohri

Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit…

Machine Learning · Computer Science 2026-01-06 Jing Wang , Peng Zhao , Zhi-Hua Zhou

A faster index algorithm and a computational study for bandits with switching costs

We address the intractable multi-armed bandit problem with switching costs, for which Asawa and Teneketzis introduced in [M. Asawa and D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automat. Control, 41…

Optimization and Control · Mathematics 2023-04-05 José Niño-Mora

BALLAST: Bandit-Assisted Learning for Latency-Aware Stable Timeouts in Raft

Randomized election timeouts are a simple and effective liveness heuristic for Raft, but they become brittle under long-tail latency, jitter, and partition recovery, where repeated split votes can inflate unavailability. This paper presents…

Machine Learning · Computer Science 2025-12-25 Qizhi Wang

Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

In restless bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless bandits problem…

Machine Learning · Computer Science 2026-02-20 Nima Akbarzadeh , Yossiri Adulyasak , Erick Delage

Irrevocable Multi-Armed Bandit Policies

This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls. We develop a new `irrevocable' heuristic for this problem. In particular, we do not allow recourse to arms that were pulled at some point in the past…

Optimization and Control · Mathematics 2008-06-26 Vivek Farias , Ritesh Madan