Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments
Abstract
We consider a discrete time stochastic queueing system where a controller makes a 2-stage decision every slot. The decision at the first stage reveals a hidden source of randomness with a control-dependent (but unknown) probability distribution. The decision at the second stage incurs a penalty vector that depends on this revealed randomness. The goal is to stabilize all queues and minimize a convex function of the time average penalty vector subject to an additional set of time average penalty constraints. This setting fits a wide class of stochastic optimization problems. This includes problems of opportunistic scheduling in wireless networks, where a 2-stage decision about channel measurement and packet transmission must be made every slot without knowledge of the underlying transmission success probabilities. We develop a simple max-weight algorithm that learns efficient behavior by averaging functionals of previous outcomes. The algorithm yields performance that can be pushed arbitrarily close to optimal, with a tradeoff in convergence time and delay.
Keywords
Cite
@article{arxiv.0902.0630,
title = {Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments},
author = {Michael J. Neely},
journal= {arXiv preprint arXiv:0902.0630},
year = {2009}
}
Comments
30 pages, to be presented at the ITA workshop, University of California at San Diego, Feb. 2009