English
Related papers

Related papers: Contextual Linear Bandits with Delay as Payoff

200 papers

In this paper, we investigate a variant of the classical stochastic Multi-armed Bandit (MAB) problem, where the payoff received by an agent (either cost or reward) is both delayed, and directly corresponds to the magnitude of the delay.…

Machine Learning · Computer Science 2024-10-16 Ofir Schlisselberg , Ido Cohen , Tal Lancewicki , Yishay Mansour

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle…

Machine Learning · Computer Science 2020-04-29 Bo Xue , Guanghui Wang , Yimu Wang , Lijun Zhang

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been…

Data Structures and Algorithms · Computer Science 2014-07-08 José Bento , Stratis Ioannidis , S. Muthukrishnan , Jinyun Yan

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear…

Machine Learning · Statistics 2018-07-17 Akshay Krishnamurthy , Zhiwei Steven Wu , Vasilis Syrgkanis

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision-maker only after some delay, which is unknown and stochastic. We…

Machine Learning · Computer Science 2020-03-12 Jose Blanchet , Renyuan Xu , Zhengyuan Zhou

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

We study contextual bandits in the presence of a stage-wise constraint when the constraint must be satisfied both with high probability and in expectation. We start with the linear case where both the reward function and the stage-wise…

Machine Learning · Computer Science 2025-08-22 Aldo Pacchiano , Mohammad Ghavamzadeh , Peter Bartlett

We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and…

Machine Learning · Computer Science 2026-05-20 Youngmin Oh

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

Many sequential decision-making problems in communication networks can be modeled as contextual bandit problems, which are natural extensions of the well-known multi-armed bandit problem. In contextual bandit problems, at each time, an…

Machine Learning · Computer Science 2016-05-10 Pranav Sakulkar , Bhaskar Krishnamachari

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed…

Machine Learning · Statistics 2020-10-07 Niladri S. Chatterji , Vidya Muthukumar , Peter L. Bartlett

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit…

Machine Learning · Computer Science 2026-02-12 Zhongxuan Liu , Yue Kang , Thomas C. M. Lee

Contextual dueling bandits form a cornerstone of preference-based decision-making, with critical applications in recommender systems and large language model alignment. However, standard algorithms rely on the idealized assumption of…

Machine Learning · Computer Science 2026-05-27 Xiangyi Wang , Pingchen Lu , Jie Mao , Mingze Kong , Zhi Hong , Zhiyong Wang , Zhongxiang Dai

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult…

Machine Learning · Computer Science 2018-12-18 Maria Dimakopoulou , Zhengyuan Zhou , Susan Athey , Guido Imbens

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better…

Machine Learning · Computer Science 2014-02-04 Shipra Agrawal , Navin Goyal

We consider the problem where M agents collaboratively interact with an instance of a stochastic K-armed contextual bandit, where K>>M. The goal of the agents is to simultaneously minimize the cumulative regret over all the agents over a…

Machine Learning · Computer Science 2022-11-16 Jiabin Lin , Shana Moothedath

This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed…

Machine Learning · Computer Science 2023-12-29 Masahiro Kato , Shinji Ito

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures…

Machine Learning · Computer Science 2022-10-13 Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai
‹ Prev 1 2 3 10 Next ›