English

A Unified Algorithm for Stochastic Path Problems

Machine Learning 2022-10-18 v1 Machine Learning

Abstract

We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the scale of rewards BB_\star is unknown. We show that there is no price for adaptation, and our regret bound matches that with a known BB_\star. We also provide a scale adaptation procedure for the special case of stochastic longest paths (SLP) where all rewards are non-negative. However, unlike in SSP, we show through a lower bound that there is an unavoidable price for adaptation.

Keywords

Cite

@article{arxiv.2210.09255,
  title  = {A Unified Algorithm for Stochastic Path Problems},
  author = {Christoph Dann and Chen-Yu Wei and Julian Zimmert},
  journal= {arXiv preprint arXiv:2210.09255},
  year   = {2022}
}
R2 v1 2026-06-28T03:50:28.437Z