Related papers: Contextual Linear Bandits with Delay as Payoff

Delay as Payoff in MAB

In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay.…

Machine Learning · Computer Science 2024-10-16 Ofir Schlisselberg , Ido Cohen , Tal Lancewicki , Yishay Mansour

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle…

Machine Learning · Computer Science 2020-04-29 Bo Xue , Guanghui Wang , Yimu Wang , Lijun Zhang

A Time and Space Efficient Algorithm for Contextual Linear Bandits

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been…

Data Structures and Algorithms · Computer Science 2014-07-08 José Bento , Stratis Ioannidis , S. Muthukrishnan , Jinyun Yan

Semiparametric Contextual Bandits

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear…

Machine Learning · Statistics 2018-07-17 Akshay Krishnamurthy , Zhiwei Steven Wu , Vasilis Syrgkanis

Delay-Adaptive Learning in Generalized Linear Contextual Bandits

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision-maker only after some delay, which is unknown and stochastic. We…

Machine Learning · Computer Science 2020-03-12 Jose Blanchet , Renyuan Xu , Zhengyuan Zhou

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

Contextual Bandits with Stage-wise Constraints

We study contextual bandits in the presence of a stage-wise constraint when the constraint must be satisfied both with high probability and in expectation. We start with the linear case where both the reward function and the stage-wise…

Machine Learning · Computer Science 2025-08-22 Aldo Pacchiano , Mohammad Ghavamzadeh , Peter Bartlett

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and…

Machine Learning · Computer Science 2026-05-20 Youngmin Oh

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

Stochastic Contextual Bandits with Known Reward Functions

Many sequential decision-making problems in communication networks can be modeled as contextual bandit problems, which are natural extensions of the well-known multi-armed bandit problem. In contextual bandit problems, at each time, an…

Machine Learning · Computer Science 2016-05-10 Pranav Sakulkar , Bhaskar Krishnamachari

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed…

Machine Learning · Statistics 2020-10-07 Niladri S. Chatterji , Vidya Muthukumar , Peter L. Bartlett

Lipschitz Bandits with Stochastic Delayed Feedback

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit…

Machine Learning · Computer Science 2026-02-12 Zhongxuan Liu , Yue Kang , Thomas C. M. Lee

Linear and Neural Dueling Bandits with Delayed Feedback

Contextual dueling bandits form a cornerstone of preference-based decision-making, with critical applications in recommender systems and large language model alignment. However, standard algorithms rely on the idealized assumption of…

Machine Learning · Computer Science 2026-05-27 Xiangyi Wang , Pingchen Lu , Jie Mao , Mingze Kong , Zhi Hong , Zhiyong Wang , Zhongxiang Dai

Balanced Linear Contextual Bandits

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult…

Machine Learning · Computer Science 2018-12-18 Maria Dimakopoulou , Zhengyuan Zhou , Susan Athey , Guido Imbens

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

Distributed Stochastic Bandit Learning with Delayed Context Observation

We consider the problem where M agents collaboratively interact with an instance of a stochastic K-armed contextual bandit, where K>>M. The goal of the agents is to simultaneously minimize the cumulative regret over all the agents over a…

Machine Learning · Computer Science 2022-11-16 Jiabin Lin , Shana Moothedath

Best-of-Both-Worlds Linear Contextual Bandits

This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed…

Machine Learning · Computer Science 2023-12-29 Masahiro Kato , Shinji Ito

Biased Dueling Bandits with Stochastic Delayed Feedback

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures…

Machine Learning · Computer Science 2022-10-13 Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai