English
Related papers

Related papers: Online Learning under Delayed Feedback

200 papers

There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode…

Machine Learning · Computer Science 2023-04-07 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in…

Machine Learning · Computer Science 2021-12-16 Tal Lancewicki , Aviv Rosenberg , Yishay Mansour

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

We investigate the problem of online convex optimization with unknown delays, in which the feedback of a decision arrives with an arbitrary delay. Previous studies have presented a delayed variant of online gradient descent (OGD), and…

Machine Learning · Computer Science 2021-03-23 Yuanyu Wan , Wei-Wei Tu , Lijun Zhang

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

This paper deals with bandit online learning problems involving feedback of unknown delay that can emerge in multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function…

Machine Learning · Computer Science 2019-05-29 Bingcong Li , Tianyi Chen , Georgios B. Giannakis

We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under…

Machine Learning · Computer Science 2026-02-04 Alexander Ryabchenko , Idan Attias , Daniel M. Roy

In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays $\{d_t\}_{t=1}^T$ is unknown to the algorithm. At the $t$-th round, the…

Machine Learning · Computer Science 2022-05-18 Naresh Manwani , Mudit Agarwal

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision…

Machine Learning · Computer Science 2023-01-24 Tiancheng Jin , Tal Lancewicki , Haipeng Luo , Yishay Mansour , Aviv Rosenberg

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online…

Machine Learning · Statistics 2024-05-20 Lexing Ying

Learning at the edges has become increasingly important as large quantities of data are continually generated locally. Among others, this paradigm requires algorithms that are simple (so that they can be executed by local devices), robust…

Machine Learning · Computer Science 2024-02-06 Tuan-Anh Nguyen , Nguyen Kim Thang , Denis Trystram

We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback". In this problem, when the player pulls an arm, a reward is generated, however it is not immediately…

Machine Learning · Statistics 2018-06-14 Ciara Pike-Burke , Shipra Agrawal , Csaba Szepesvari , Steffen Grunewalder

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback…

Machine Learning · Computer Science 2022-06-02 Tianyi Lin , Aldo Pacchiano , Yaodong Yu , Michael I. Jordan

Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback. Our algorithms -- DORM,…

Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based…

Machine Learning · Statistics 2023-02-02 Sattar Vakili , Danyal Ahmed , Alberto Bernacchia , Ciara Pike-Burke

This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in…

Machine Learning · Computer Science 2025-04-17 Mohammadali Moghimi , Sharu Theresa Jose , Shana Moothedath

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other…

Machine Learning · Computer Science 2025-01-23 Mohammad Pedramfar , Vaneet Aggarwal

Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with…

Machine Learning · Computer Science 2020-12-29 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the…

Machine Learning · Computer Science 2024-05-28 Yogev Bar-On , Yishay Mansour

We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this…

Machine Learning · Computer Science 2019-11-15 Corinna Cortes , Giulia DeSalvo , Claudio Gentile , Mehryar Mohri , Scott Yang
‹ Prev 1 2 3 10 Next ›