English

Markov Decision Processing Networks

Optimization and Control 2025-09-30 v1 Networking and Internet Architecture Probability

Abstract

We introduce Markov Decision Processing Networks (MDPNs) as a multiclass queueing network model where service is a controlled, finite-state Markov process. The model exhibits a decision-dependent service process where actions taken influence future service availability. Viewed as a two-sided queueing model, this captures settings such as assemble-to-order systems, ride-hailing platforms, cross-skilled call centers, and quantum switches. We first characterize the capacity region of MDPNs. Unlike classical switched networks, the MDPN capacity region depends on the long-run mix of service states induced by the control of the underlying service process. We show, via a counterexample, that MaxWeight is not throughput-optimal in this class, demonstrating the distinction between MDPNs and classical queueing models. To bridge this gap, we design a weighted average reward policy, a multiobjective MDP that leverages a two-timescale separation at the fluid scale. We prove throughput-optimality of the resulting policy. The techniques yield a clear capacity region description and apply to a broad family of two-sided matching systems.

Keywords

Cite

@article{arxiv.2509.24541,
  title  = {Markov Decision Processing Networks},
  author = {Sanidhay Bhambay and Thirupathaiah Vasantam and Neil Walton},
  journal= {arXiv preprint arXiv:2509.24541},
  year   = {2025}
}