Related papers: Online Learning under Delayed Feedback

Optimism and Delays in Episodic Reinforcement Learning

There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode…

Machine Learning · Computer Science 2023-04-07 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

Learning Adversarial Markov Decision Processes with Delayed Feedback

Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in…

Machine Learning · Computer Science 2021-12-16 Tal Lancewicki , Aviv Rosenberg , Yishay Mansour

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

Online Strongly Convex Optimization with Unknown Delays

We investigate the problem of online convex optimization with unknown delays, in which the feedback of a decision arrives with an arbitrary delay. Previous studies have presented a delayed variant of online gradient descent (OGD), and…

Machine Learning · Computer Science 2021-03-23 Yuanyu Wan , Wei-Wei Tu , Lijun Zhang

Biased Dueling Bandits with Stochastic Delayed Feedback

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

Bandit Online Learning with Unknown Delays

This paper deals with bandit online learning problems involving feedback of unknown delay that can emerge in multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function…

Machine Learning · Computer Science 2019-05-29 Bingcong Li , Tianyi Chen , Georgios B. Giannakis

A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees

We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under…

Machine Learning · Computer Science 2026-02-04 Alexander Ryabchenko , Idan Attias , Daniel M. Roy

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the…

Machine Learning · Computer Science 2022-05-18 Naresh Manwani , Mudit Agarwal

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision…

Machine Learning · Computer Science 2023-01-24 Tiancheng Jin , Tal Lancewicki , Haipeng Luo , Yishay Mansour , Aviv Rosenberg

A note on continuous-time online learning

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online…

Machine Learning · Statistics 2024-05-20 Lexing Ying

Handling Delayed Feedback in Distributed Online Optimization : A Projection-Free Approach

Learning at the edges has become increasingly important as large quantities of data are continually generated locally. Among others, this paradigm requires algorithms that are simple (so that they can be executed by local devices), robust…

Machine Learning · Computer Science 2024-02-06 Tuan-Anh Nguyen , Nguyen Kim Thang , Denis Trystram

Bandits with Delayed, Aggregated Anonymous Feedback

We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback". In this problem, when the player pulls an arm, a reward is generated, however it is not immediately…

Machine Learning · Statistics 2018-06-14 Ciara Pike-Burke , Shipra Agrawal , Csaba Szepesvari , Steffen Grunewalder

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback…

Machine Learning · Computer Science 2022-06-02 Tianyi Lin , Aldo Pacchiano , Yaodong Yu , Michael I. Jordan

Online Learning with Optimism and Delay

Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback. Our algorithms -- DORM,…

Machine Learning · Computer Science 2021-07-13 Genevieve Flaspohler , Francesco Orabona , Judah Cohen , Soukayna Mouatadid , Miruna Oprescu , Paulo Orenstein , Lester Mackey

Delayed Feedback in Kernel Bandits

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based…

Machine Learning · Statistics 2023-02-02 Sattar Vakili , Danyal Ahmed , Alberto Bernacchia , Ciara Pike-Burke

Neural Contextual Bandits Under Delayed Feedback Constraints

This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in…

Machine Learning · Computer Science 2025-04-17 Mohammadali Moghimi , Sharu Theresa Jose , Shana Moothedath

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other…

Machine Learning · Computer Science 2025-01-23 Mohammad Pedramfar , Vaneet Aggarwal

Lifelong Learning in Multi-Armed Bandits

Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with…

Machine Learning · Computer Science 2020-12-29 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Non-stochastic Bandits With Evolving Observations

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the…

Machine Learning · Computer Science 2024-05-28 Yogev Bar-On , Yishay Mansour

Online Learning with Abstention

We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this…

Machine Learning · Computer Science 2019-11-15 Corinna Cortes , Giulia DeSalvo , Claudio Gentile , Mehryar Mohri , Scott Yang