Related papers: LoopStack: a Lightweight Tensor Algebra Compiler S…

LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries…

Machine Learning · Computer Science 2023-11-09 Dejan Grubisic , Bram Wasti , Chris Cummins , John Mellor-Crummey , Aleksandar Zlateski

Towards an Achievable Performance for the Loop Nests

Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture,…

Performance · Computer Science 2019-11-27 Aniket Shivam , Neftali Watkinson , Alexandru Nicolau , David Padua , Alexander V. Veidenbaum

SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General…

Programming Languages · Computer Science 2024-08-20 Adhitha Dias , Logan Anderson , Kirshanthan Sundararajah , Artem Pelenitsyn , Milind Kulkarni

Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network

Sparse tensor decomposition and completion are common in numerous applications, ranging from machine learning to computational quantum chemistry. Typically, the main bottleneck in optimization of these models are contractions of a single…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-17 Raghavendra Kanakagiri , Edgar Solomonik

Learning to Make Compiler Optimizations More Effective

Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer…

Programming Languages · Computer Science 2021-03-01 Rahim Mammadli , Marija Selakovic , Felix Wolf , Michael Pradel

SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring

Sparse tensor algebra computations have become important in many real-world applications like machine learning, scientific simulations, and data mining. Hence, automated code generation and performance optimizations for tensor algebra…

Programming Languages · Computer Science 2022-05-25 Adhitha Dias , Kirshanthan Sundararajah , Charitha Saumya , Milind Kulkarni

PassNet: Scaling Large Language Models for Graph Compiler Pass Generation

Modern tensor compilers such as TorchInductor deliver substantial speedups on mainstream models, yet face a systematic performance ceiling on long-tail workloads -- our profiling shows that 43% of real-world subgraphs experience end-to-end…

Artificial Intelligence · Computer Science 2026-05-29 Yiqun Liu , Yingsheng Wu , Ruqi Yang , Enrong Zheng , Honglei Qiu , Sijun He , Tai Liang , Jingjing Wu , Yuhan Zhou , Yiwei Zhang , Dongyan Chen , Weihan Yi , Xinqi Li , Siqi Bao

Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains…

Programming Languages · Computer Science 2025-11-07 Charles Hong , Sahil Bhatia , Alvin Cheung , Yakun Sophia Shao

MCompiler: A Synergistic Compilation Framework

This paper presents a meta-compilation framework, the MCompiler. The main idea is that different segments of a program can be compiled with different compilers/optimizers and combined into a single executable. The MCompiler can be used in a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-31 Aniket Shivam , Alexandru Nicolau , Alexander V. Veidenbaum

AI Powered Compiler Techniques for DL Code Optimization

Creating high performance implementations of deep learning primitives on CPUs is a challenging task. Multiple considerations including multi-level cache hierarchy, and wide SIMD units of CPU platforms influence the choice of program…

Programming Languages · Computer Science 2021-04-13 Sanket Tavarageri , Gagandeep Goyal , Sasikanth Avancha , Bharat Kaul , Ramakrishna Upadrasta

ACT: Automatically Generating Compiler Backends from Tensor Accelerator ISA Descriptions

Tensor compilers play a key role in enabling high-performance implementations of deep learning workloads. These compilers rely on existing CPU and GPU code generation backends to generate device-specific code. Recently, many tensor…

Programming Languages · Computer Science 2025-10-14 Devansh Jain , Akash Pardeshi , Marco Frigo , Krut Patel , Kaustubh Khulbe , Jai Arora , Charith Mendis

Stack operation of tensor networks

The tensor network, as a facterization of tensors, aims at performing the operations that are common for normal tensors, such as addition, contraction and stacking. However, due to its non-unique network structure, only the tensor network…

Machine Learning · Computer Science 2022-05-25 Tianning Zhang , Tianqi Chen , Erping Li , Bo Yang , L. K. Ang

Compilation of Modular and General Sparse Workspaces

Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result…

Programming Languages · Computer Science 2024-04-09 Genghan Zhang , Olivia Hsu , Fredrik Kjolstad

Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture

We introduce Stardust, a compiler that compiles sparse tensor algebra to reconfigurable dataflow architectures (RDAs). Stardust introduces new user-provided data representation and scheduling language constructs for mapping to…

Programming Languages · Computer Science 2022-11-08 Olivia Hsu , Alexander Rucker , Tian Zhao , Kunle Olukotun , Fredrik Kjolstad

Compressing Structured Tensor Algebra

Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra…

Programming Languages · Computer Science 2024-07-19 Mahdi Ghorbani , Emilien Bauer , Tobias Grosser , Amir Shaikhha

CoNST: Code Generator for Sparse Tensor Networks

Sparse tensor networks are commonly used to represent contractions over sparse tensors. Tensor contractions are higher-order analogs of matrix multiplication. Tensor networks arise commonly in many domains of scientific computing and data…

Programming Languages · Computer Science 2024-01-11 Saurabh Raje , Yufan Xu , Atanas Rountev , Edward F. Valeev , Saday Sadayappan

Utilizing Static Analysis and Code Generation to Accelerate Neural Networks

As datasets continue to grow, neural network (NN) applications are becoming increasingly limited by both the amount of available computational power and the ease of developing high-performance applications. Researchers often must have…

Neural and Evolutionary Computing · Computer Science 2012-07-03 Lawrence McAfee , Kunle Olukotun

Bring Your Own Codegen to Deep Learning Compiler

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage…

Machine Learning · Computer Science 2021-05-10 Zhi Chen , Cody Hao Yu , Trevor Morris , Jorn Tuyls , Yi-Hsiang Lai , Jared Roesch , Elliott Delaye , Vin Sharma , Yida Wang

N-TORC: Native Tensor Optimizer for Real-time Constraints

Compared to overlay-based tensor architectures like VTA or Gemmini, compilers that directly translate machine learning models into a dataflow architecture as HLS code, such as HLS4ML and FINN, generally can achieve lower latency by…

Hardware Architecture · Computer Science 2025-04-08 Suyash Vardhan Singh , Iftakhar Ahmad , David Andrews , Miaoqing Huang , Austin R. J. Downey , Jason D. Bakos

Cortex: A Compiler for Recursive Deep Learning Models

Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves…

Machine Learning · Computer Science 2021-03-08 Pratik Fegade , Tianqi Chen , Phillip B. Gibbons , Todd C. Mowry