English

Loo.py: transformation-based code generation for GPUs and CPUs

Programming Languages 2014-06-02 v1 Mathematical Software Numerical Analysis

Abstract

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form.

Keywords

Cite

@article{arxiv.1405.7470,
  title  = {Loo.py: transformation-based code generation for GPUs and CPUs},
  author = {Andreas Klöckner},
  journal= {arXiv preprint arXiv:1405.7470},
  year   = {2014}
}
R2 v1 2026-06-22T04:25:49.392Z