Related papers: LOOPer: A Learned Automatic Code Optimizer For Pol…
In this paper, we present a work in progress about a deep learning based approach for automatic code optimization in polyhedral compilers. The proposed technique explores combinations of affine and non-affine loop transformations to find…
The advancement of machine learning for compiler optimization, particularly within the polyhedral model, is constrained by the scarcity of large-scale, public performance datasets. This data bottleneck forces researchers to undertake costly…
Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer…
Enabling compilers to automatically optimize code has been a longstanding goal for the compiler community. Efficiently solving this problem requires using precise cost models. These models predict whether applying a sequence of code…
Polyhedral compilers can perform complex loop optimizations that improve parallelism and cache behaviour of loops in the input program. These transformations result in significant performance gains on modern processors which have large…
Polyhedral techniques have been widely used for automatic code optimization in low-level compilers and higher-level processes. Loop optimization is central to this technique, and several polyhedral schedulers like Feautrier, Pluto, isl and…
The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this…
Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries…
Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves…
Creating high performance implementations of deep learning primitives on CPUs is a challenging task. Multiple considerations including multi-level cache hierarchy, and wide SIMD units of CPU platforms influence the choice of program…
Loop transformations are semantics-preserving optimization techniques, widely used to maximize objectives such as parallelism. Despite decades of research, applying the optimal composition of loop transformations remains challenging due to…
Automatic code optimization remains a difficult challenge, particularly for complex loop nests on modern hardware. This paper investigates a novel approach to code optimization where Large Language Models (LLMs) guide the process through a…
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions…
The rapid growth of deep learning models has increased the demand for efficient distributed training strategies. Fully sharded approaches like ZeRO-3 and FSDP partition model parameters across GPUs and apply optimizations such as…
Compilers are crucial in optimizing programs and accelerating their execution. However, optimizing programs automatically using compilers is not trivial. Recent work has attempted to use reinforcement learning (RL) to solve this problem. It…
Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant…
Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains…
The way heuristic optimizers are designed has evolved over the decades, as computing power has increased. Such has been the case for the Linear Ordering Problem (LOP), a field in which trajectory-based strategies led the way during the…
A compiler processes the code written in a high level language and produces machine executable code. The compiler writers often face the challenge of keeping the compilation times reasonable. That is because aggressive optimization passes…
Major advancements in building general-purpose and customized hardware have been one of the key enablers of versatility and pervasiveness of machine learning models such as deep neural networks. To sustain this ubiquitous deployment of…