English
Related papers

Related papers: A Linear Programming Relaxation and a Heuristic fo…

200 papers

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}^d$-valued stationary $\varphi$-mixing sequence of parameters…

Machine Learning · Statistics 2024-05-20 Azadeh Khaleghi

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision,…

Machine Learning · Computer Science 2020-07-03 Asaf Cassel , Tomer Koren

Partially observable restless multi-armed bandits have found numerous applications including in recommendation systems, communication systems, public healthcare outreach systems, and in operations research. We study multi-action partially…

Machine Learning · Computer Science 2025-09-03 Rahul Meshram , Kesav Kaza

Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward…

Machine Learning · Computer Science 2022-03-08 Pierre Laforgue , Giulia Clerici , Nicolò Cesa-Bianchi , Ran Gilad-Bachrach

The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly…

Machine Learning · Statistics 2017-03-30 Christopher R. Dance , Tomi Silander

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit…

Machine Learning · Computer Science 2023-06-08 Jing Wang , Peng Zhao , Zhi-Hua Zhou

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized…

Machine Learning · Computer Science 2020-10-09 Yu-Heng Hung , Ping-Chun Hsieh , Xi Liu , P. R. Kumar

Time-constrained decision processes have been ubiquitous in many fundamental applications in physics, biology and computer science. Recently, restart strategies have gained significant attention for boosting the efficiency of…

Machine Learning · Computer Science 2020-07-02 Semih Cayci , Atilla Eryilmaz , R. Srikant

We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget. For this problem, we prove matching upper and…

Machine Learning · Computer Science 2021-03-22 David Simchi-Levi , Yunzong Xu

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.…

Optimization and Control · Mathematics 2015-01-30 Cem Tekin , Mingyan Liu

We consider a Multi-Armed Bandit problem in which the rewards are non-stationary and are dependent on past actions and potentially on past contexts. At the heart of our method, we employ a recurrent neural network, which models these…

Machine Learning · Computer Science 2023-03-29 Michael Rotman , Lior Wolf

We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected…

Machine Learning · Computer Science 2021-10-26 Ramakrishnan Krishnamurthy , Aditya Gopalan

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators…

Machine Learning · Computer Science 2026-03-30 Alberto Rumi , Andrew Jacobsen , Nicolò Cesa-Bianchi , Fabio Vitale

This paper studies a discrete-time optimal switching problem on a finite horizon. The underlying model has a running reward, terminal reward and signed (positive and negative) switching costs. Using the martingale approach to optimal…

Optimization and Control · Mathematics 2016-10-17 Randall Martyr

We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a \emph{switching cost} is incurred for shifting to a new action. All previously known…

Machine Learning · Computer Science 2020-03-24 Raman Arora , Teodor V. Marinov , Mehryar Mohri

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit…

Machine Learning · Computer Science 2026-01-06 Jing Wang , Peng Zhao , Zhi-Hua Zhou

We address the intractable multi-armed bandit problem with switching costs, for which Asawa and Teneketzis introduced in [M. Asawa and D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automat. Control, 41…

Optimization and Control · Mathematics 2023-04-05 José Niño-Mora

Randomized election timeouts are a simple and effective liveness heuristic for Raft, but they become brittle under long-tail latency, jitter, and partition recovery, where repeated split votes can inflate unavailability. This paper presents…

Machine Learning · Computer Science 2025-12-25 Qizhi Wang

In restless bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless bandits problem…

Machine Learning · Computer Science 2026-02-20 Nima Akbarzadeh , Yossiri Adulyasak , Erick Delage

This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls. We develop a new `irrevocable' heuristic for this problem. In particular, we do not allow recourse to arms that were pulled at some point in the past…

Optimization and Control · Mathematics 2008-06-26 Vivek Farias , Ritesh Madan
‹ Prev 1 2 3 10 Next ›