Related papers: Adaptive Experimentation with Delayed Binary Feedb…

Biased Dueling Bandits with Stochastic Delayed Feedback

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

Budgeted Recommendation with Delayed Feedback

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in…

Machine Learning · Computer Science 2024-05-21 Kweiguu Liu , Setareh Maghsudi

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback.…

Methodology · Statistics 2023-07-04 Lei Shi , Jingshen Wang , Tianhao Wu

Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application

The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an…

Machine Learning · Computer Science 2024-02-13 Yandi Li , Jianxiong Guo , Yupeng Li , Tian Wang , Weijia Jia

Online Learning under Delayed Feedback

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect…

Machine Learning · Computer Science 2015-07-02 Pooria Joulani , András György , Csaba Szepesvári

Stochastic Submodular Bandits with Delayed Composite Anonymous Bandit Feedback

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other…

Machine Learning · Computer Science 2025-01-23 Mohammad Pedramfar , Vaneet Aggarwal

Handling many conversions per click in modeling delayed feedback

Predicting the expected value or number of post-click conversions (purchases or other events) is a key task in performance-based digital advertising. In training a conversion optimizer model, one of the most crucial aspects is handling…

Machine Learning · Computer Science 2021-01-08 Ashwinkumar Badanidiyuru , Andrew Evdokimov , Vinodh Krishnan , Pan Li , Wynn Vonnegut , Jayden Wang

Linear Bandits with Stochastic Delayed Feedback

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by…

Machine Learning · Statistics 2020-03-03 Claire Vernade , Alexandra Carpentier , Tor Lattimore , Giovanni Zappella , Beyza Ermis , Michael Brueckner

Multiarmed Bandit Problems with Delayed Feedback

In this paper we initiate the study of optimization of bandit type problems in scenarios where the feedback of a play is not immediately known. This arises naturally in allocation problems which have been studied extensively in the…

Data Structures and Algorithms · Computer Science 2015-03-17 Sudipto Guha , Kamesh Munagala , Martin Pal

Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback

Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and…

Artificial Intelligence · Computer Science 2025-11-17 Mohammadsina Almasi , Hadis Anahideh

Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments

Firms implementing digital advertising campaigns face a complex problem in determining the right match between their advertising creatives and target audiences. Typical solutions to the problem have leveraged non-experimental methods, or…

Machine Learning · Computer Science 2019-09-06 Tong Geng , Xiliang Lin , Harikesh S. Nair

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a…

Machine Learning · Computer Science 2023-07-21 Thomas M. McDonald , Lucas Maystre , Mounia Lalmas , Daniel Russo , Kamil Ciosek

Impatient Bandits: Optimizing for the Long-Term Without Delay

Increasingly, recommender systems are tasked with improving users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a bandit problem with delayed rewards. There is an apparent trade-off in…

Machine Learning · Computer Science 2025-01-15 Kelly W. Zhang , Thomas Baldwin-McDonald , Kamil Ciosek , Lucas Maystre , Daniel Russo

Stochastic Bandit Models for Delayed Conversions

Online advertising and product recommendation are important domains of applications for multi-armed bandit methods. In these fields, the reward that is immediately available is most often only a proxy for the actual outcome of interest,…

Machine Learning · Computer Science 2017-07-13 Claire Vernade , Olivier Cappé , Vianney Perchet

Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes

A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB…

Machine Learning · Computer Science 2019-07-31 Larkin Liu , Richard Downe , Joshua Reid

Demonstration Experiments

Adaptive experiments are used extensively in online platforms, healthcare and biotechnology, and a variety of other settings. In many of these applications, the main goal is not to precisely estimate a treatment effect, but to demonstrate…

Statistics Theory · Mathematics 2026-03-10 Guido Imbens , Lorenzo Masoero , Alexander Rakhlin , Thomas S. Richardson , Suhas Vijaykumar

Autonomous Learning by Dynamical Systems with Inertial or Delayed Feedbacks

Dynamical systems can autonomously adapt their organization so that the required target dynamics is reproduced. In the previous Rapid Communication [Phys. Rev. E 90,030901(R) (2014)], it was shown how such systems can be designed using…

Adaptation and Self-Organizing Systems · Physics 2016-11-04 Pablo Kaluza , Alexander S. Mikhailov

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

We consider the adversarial multi-armed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step-size using only information (about the losses and delays) available at the time of the decisions,…

Machine Learning · Computer Science 2020-10-14 András György , Pooria Joulani

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents…

Machine Learning · Computer Science 2022-04-19 Yu-Guan Hsieh , Franck Iutzeler , Jérôme Malick , Panayotis Mertikopoulos

Best arm identification in multi-armed bandits with delayed feedback

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample…

Machine Learning · Computer Science 2018-03-30 Aditya Grover , Todor Markov , Peter Attia , Norman Jin , Nicholas Perkins , Bryan Cheong , Michael Chen , Zi Yang , Stephen Harris , William Chueh , Stefano Ermon