English
Related papers

Related papers: An Optimization-based Algorithm for Non-stationary…

200 papers

We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret. Specifically, our algorithm achieves dynamic regret $\mathcal{O}(\min\{\sqrt{ST},…

Machine Learning · Computer Science 2019-06-19 Yifang Chen , Chung-Wei Lee , Haipeng Luo , Chen-Yu Wei

Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit…

Machine Learning · Computer Science 2019-04-05 Haipeng Luo , Chen-Yu Wei , Alekh Agarwal , John Langford

This paper studies a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization, where one seeks to minimize the regret under an unknown reward function that varies over time. In particular, we focus on a…

Machine Learning · Computer Science 2024-10-22 Shogo Iwazaki , Shion Takeno

Non-stationary multi-armed bandits enable agents to adapt to changing environments by incorporating mechanisms to detect and respond to shifts in reward distributions, making them well-suited for dynamic settings. However, existing…

Machine Learning · Computer Science 2025-09-19 Shaoang Li , Jian Li

We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly…

Machine Learning · Computer Science 2021-09-07 Chen-Yu Wei , Haipeng Luo

We investigate the non-stationary stochastic linear bandit problem where the reward distribution evolves each round. Existing algorithms characterize the non-stationarity by the total variation budget $B_K$, which is the summation of the…

Machine Learning · Computer Science 2024-03-19 Zhiyong Wang , Jize Xie , Yi Chen , John C. S. Lui , Dongruo Zhou

We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and…

Machine Learning · Computer Science 2021-03-19 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected…

Machine Learning · Computer Science 2021-10-26 Ramakrishnan Krishnamurthy , Aditya Gopalan

Contextual bandits are a rich model for sequential decision making given side information, with important applications, e.g., in recommender systems. We propose novel algorithms for contextual bandits harnessing neural networks to…

Machine Learning · Statistics 2022-03-01 Parnian Kassraie , Andreas Krause

In this paper, we investigate the non-stationary combinatorial semi-bandit problem, both in the switching case and in the dynamic case. In the general case where (a) the reward function is non-linear, (b) arms may be probabilistically…

Machine Learning · Computer Science 2021-06-22 Wei Chen , Liwei Wang , Haoyu Zhao , Kai Zheng

In this paper we study the non-stationary stochastic optimization question with bandit feedback and dynamic regret measures. The seminal work of Besbes et al. (2015) shows that, when aggregated function changes is known a priori, a simple…

Machine Learning · Statistics 2022-10-12 Yining Wang

This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Existing studies develop various algorithms and show that they enjoy an…

Machine Learning · Computer Science 2021-12-23 Peng Zhao , Lijun Zhang , Yuan Jiang , Zhi-Hua Zhou

We study the problem of \emph{dynamic regret minimization} in $K$-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes…

Machine Learning · Computer Science 2022-06-14 Aadirupa Saha , Shubham Gupta

We study the kernelized bandit problem, that involves designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$ with a norm bounded by $M<\infty$ in a…

Machine Learning · Computer Science 2022-03-15 Shubhanshu Shekhar , Tara Javidi

We study the problem of worst case regret in piecewise stationary multi armed bandits. While the minimax theory for stationary bandits is well established, understanding analogous limits in time-varying settings is challenging. Existing…

Machine Learning · Computer Science 2025-11-11 Gal Mendelson , Eyal Tadmor

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment.…

Machine Learning · Computer Science 2021-07-20 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

Recently, several studies (Zhou et al., 2021a; Zhang et al., 2021b; Kim et al., 2021; Zhou and Gu, 2022) have provided variance-dependent regret bounds for linear contextual bandits, which interpolates the regret for the worst-case regime…

Machine Learning · Computer Science 2023-02-22 Heyang Zhao , Jiafan He , Dongruo Zhou , Tong Zhang , Quanquan Gu

In this paper, we study the MNL-Bandit problem in a non-stationary environment and present an algorithm with a worst-case expected regret of $\tilde{O}\left( \min \left\{ \sqrt{NTL}\;,\; N^{\frac{1}{3}}(\Delta_{\infty}^{K})^{\frac{1}{3}}…

Machine Learning · Computer Science 2023-06-05 Ayoub Foussoul , Vineet Goyal , Varun Gupta

We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. The only two existing attempts in this line of work fall short across multiple dimensions, including…

Machine Learning · Computer Science 2022-10-27 Thomas Kleine Buening , Aadirupa Saha

We study an infinite-armed bandit problem where actions' mean rewards are initially sampled from a reservoir distribution. Most prior works in this setting focused on stationary rewards (Berry et al., 1997; Wang et al., 2008; Bonald and…

Machine Learning · Computer Science 2025-02-04 Joe Suk , Jung-hun Kim
‹ Prev 1 2 3 10 Next ›