English
Related papers

Related papers: Adaptive Experimentation with Delayed Binary Feedb…

200 papers

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in…

Machine Learning · Computer Science 2024-05-21 Kweiguu Liu , Setareh Maghsudi

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback.…

Methodology · Statistics 2023-07-04 Lei Shi , Jingshen Wang , Tianhao Wu

The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an…

Machine Learning · Computer Science 2024-02-13 Yandi Li , Jianxiong Guo , Yupeng Li , Tian Wang , Weijia Jia

Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect…

Machine Learning · Computer Science 2015-07-02 Pooria Joulani , András György , Csaba Szepesvári

This paper investigates the problem of combinatorial multiarmed bandits with stochastic submodular (in expectation) rewards and full-bandit delayed feedback, where the delayed feedback is assumed to be composite and anonymous. In other…

Machine Learning · Computer Science 2025-01-23 Mohammad Pedramfar , Vaneet Aggarwal

Predicting the expected value or number of post-click conversions (purchases or other events) is a key task in performance-based digital advertising. In training a conversion optimizer model, one of the most crucial aspects is handling…

Machine Learning · Computer Science 2021-01-08 Ashwinkumar Badanidiyuru , Andrew Evdokimov , Vinodh Krishnan , Pan Li , Wynn Vonnegut , Jayden Wang

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by…

In this paper we initiate the study of optimization of bandit type problems in scenarios where the feedback of a play is not immediately known. This arises naturally in allocation problems which have been studied extensively in the…

Data Structures and Algorithms · Computer Science 2015-03-17 Sudipto Guha , Kamesh Munagala , Martin Pal

Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and…

Artificial Intelligence · Computer Science 2025-11-17 Mohammadsina Almasi , Hadis Anahideh

Firms implementing digital advertising campaigns face a complex problem in determining the right match between their advertising creatives and target audiences. Typical solutions to the problem have leveraged non-experimental methods, or…

Machine Learning · Computer Science 2019-09-06 Tong Geng , Xiliang Lin , Harikesh S. Nair

Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a…

Machine Learning · Computer Science 2023-07-21 Thomas M. McDonald , Lucas Maystre , Mounia Lalmas , Daniel Russo , Kamil Ciosek

Increasingly, recommender systems are tasked with improving users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a bandit problem with delayed rewards. There is an apparent trade-off in…

Machine Learning · Computer Science 2025-01-15 Kelly W. Zhang , Thomas Baldwin-McDonald , Kamil Ciosek , Lucas Maystre , Daniel Russo

Online advertising and product recommendation are important domains of applications for multi-armed bandit methods. In these fields, the reward that is immediately available is most often only a proxy for the actual outcome of interest,…

Machine Learning · Computer Science 2017-07-13 Claire Vernade , Olivier Cappé , Vianney Perchet

A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB…

Machine Learning · Computer Science 2019-07-31 Larkin Liu , Richard Downe , Joshua Reid

Adaptive experiments are used extensively in online platforms, healthcare and biotechnology, and a variety of other settings. In many of these applications, the main goal is not to precisely estimate a treatment effect, but to demonstrate…

Statistics Theory · Mathematics 2026-03-10 Guido Imbens , Lorenzo Masoero , Alexander Rakhlin , Thomas S. Richardson , Suhas Vijaykumar

Dynamical systems can autonomously adapt their organization so that the required target dynamics is reproduced. In the previous Rapid Communication [Phys. Rev. E 90,030901(R) (2014)], it was shown how such systems can be designed using…

Adaptation and Self-Organizing Systems · Physics 2016-11-04 Pablo Kaluza , Alexander S. Mikhailov

We consider the adversarial multi-armed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step-size using only information (about the losses and delays) available at the time of the decisions,…

Machine Learning · Computer Science 2020-10-14 András György , Pooria Joulani

In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents…

Machine Learning · Computer Science 2022-04-19 Yu-Guan Hsieh , Franck Iutzeler , Jérôme Malick , Panayotis Mertikopoulos

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample…

‹ Prev 1 2 3 10 Next ›