English

Automatically Batching Control-Intensive Programs for Modern Accelerators

Distributed, Parallel, and Cluster Computing 2020-03-13 v2 Machine Learning Programming Languages

Abstract

We present a general approach to batching arbitrary computations for accelerators such as GPUs. We show orders-of-magnitude speedups using our method on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transforming a single-example implementation into a form that explicitly tracks the current program point for each batch member, and only steps forward those in the same place. We present two different batching algorithms: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implemenents recursion directly and can batch across it. We implement these batching methods as a general program transformation on Python source. Both the batching system and the NUTS implementation presented here are available as part of the popular TensorFlow Probability software package.

Keywords

Cite

@article{arxiv.1910.11141,
  title  = {Automatically Batching Control-Intensive Programs for Modern Accelerators},
  author = {Alexey Radul and Brian Patton and Dougal Maclaurin and Matthew D. Hoffman and Rif A. Saurous},
  journal= {arXiv preprint arXiv:1910.11141},
  year   = {2020}
}

Comments

10 pages; Machine Learning and Systems 2020

R2 v1 2026-06-23T11:53:45.782Z