English

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Machine Learning 2024-03-07 v5

Abstract

We study stochastic delayed feedback in general multi-agent sequential decision making, which includes bandits, single-agent Markov decision processes (MDPs), and Markov games (MGs). We propose a novel reduction-based framework, which turns any multi-batched algorithm for sequential decision making with instantaneous feedback into a sample-efficient algorithm that can handle stochastic delays in sequential decision making. By plugging different multi-batched algorithms into our framework, we provide several examples demonstrating that our framework not only matches or improves existing results for bandits, tabular MDPs, and tabular MGs, but also provides the first line of studies on delays in sequential decision making with function approximation. In summary, we provide a complete set of sharp results for multi-agent sequential decision making with delayed feedback.

Keywords

Cite

@article{arxiv.2302.01477,
  title  = {A Reduction-based Framework for Sequential Decision Making with Delayed Feedback},
  author = {Yunchang Yang and Han Zhong and Tianhao Wu and Bin Liu and Liwei Wang and Simon S. Du},
  journal= {arXiv preprint arXiv:2302.01477},
  year   = {2024}
}

Comments

Accepted by Neurips 2023. arXiv admin note: text overlap with arXiv:2110.14555 by other authors

R2 v1 2026-06-28T08:30:55.804Z