Related papers: ACC Saturator: Automatic Kernel Optimization for D…

Sketch-Guided Equality Saturation: Scaling Equality Saturation to Complex Optimizations of Functional Programs

Generating high-performance code for diverse hardware and application domains is challenging. Functional array programming languages with patterns like map and reduce have been successfully combined with term rewriting to define and explore…

Programming Languages · Computer Science 2022-06-06 Thomas Koehler , Phil Trinder , Michel Steuwer

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-18 Saeed Taheri , Apan Qasem , Martin Burtscher

Equality Saturation: A New Approach to Optimization

Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for…

Programming Languages · Computer Science 2015-07-01 Ross Tate , Michael Stepp , Zachary Tatlock , Sorin Lerner

Analytical Performance Estimation during Code Generation on Modern GPUs

Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-08 Dominik Ernst , Markus Holzer , Georg Hager , Matthias Knorr , Gerhard Wellein

Autotuning GPU Kernels via Static and Predictive Analysis

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-30 Robert V. Lim , Boyana Norris , Allen D. Malony

A Data-driven Analysis of Code Optimizations

As the demand for computational power grows, optimizing code through compilers becomes increasingly crucial. In this context, we focus on fully automatic code optimization techniques that automate the process of selecting and applying code…

Programming Languages · Computer Science 2025-11-11 Yacine Hakimi , Riyadh Baghdadi

Comprehensive Optimization of Parametric Kernels for Graphics Processing Units

This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Xiaohui Chen , Marc Moreno-Maza , Jeeva Paudel , Ning Xie

A Two-Stage GPU Kernel Tuner Combining Semantic Refactoring and Search-Based Optimization

GPU code optimization is a key performance bottleneck for HPC workloads as well as large-model training and inference. Although compiler optimizations and hand-written kernels can partially alleviate this issue, achieving…

Computation and Language · Computer Science 2026-01-26 Qiuyi Qu , Yicheng Sui , Yufei Sun , Rui Chen , Xiaofei Zhang , Yuzhi Zhang , Haofeng Wang , Ge Lan

Benchmarking optimization algorithms for auto-tuning GPU kernels

Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-05 Richard Schoonhoven , Ben van Werkhoven , Kees Joost Batenburg

Equality Saturation for Tensor Graph Superoptimization

One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover…

Artificial Intelligence · Computer Science 2021-03-18 Yichen Yang , Phitchaya Mangpo Phothilimtha , Yisu Remy Wang , Max Willsey , Sudip Roy , Jacques Pienaar

Adaptive Neural Compilation

This paper proposes an adaptive neural-compilation framework to address the problem of efficient program learning. Traditional code optimisation strategies used in compilers are based on applying pre-specified set of transformations that…

Artificial Intelligence · Computer Science 2016-05-27 Rudy Bunel , Alban Desmaison , Pushmeet Kohli , Philip H. S. Torr , M. Pawan Kumar

Speedup for quantum optimal control from automatic differentiation based on graphics processing units

We implement a quantum optimal control algorithm based on automatic differentiation and harness the acceleration afforded by graphics processing units (GPUs). Automatic differentiation allows us to specify advanced optimization criteria and…

Quantum Physics · Physics 2017-04-19 Nelson Leung , Mohamed Abdelhafez , Jens Koch , David I. Schuster

Generating Binary Optimal Codes Using Heterogeneous Parallel Computing

Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths $n$ using serial algorithms is…

Information Theory · Computer Science 2015-07-21 Srajan Paliwal , Saurabh Tiwary , Bhaskar Chaudhury , Manish K. Gupta

Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Towards a Benchmarking Suite for Kernel Tuners

As computing system become more complex, it is becoming harder for programmers to keep their codes optimized as the hardware gets updated. Autotuners try to alleviate this by hiding as many architecture-based optimization details as…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-17 Jacob O. Tørring , Ben van Werkhoven , Filip Petrovic , Floris-Jan Willemsen , Jirí Filipovic , Anne C. Elster

Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to…

Performance · Computer Science 2024-02-21 Weicheng Xue , Christohper John Roy

On the Optimization of Equivalent Concurrent Computations

In this submission, we explore the use of equality saturation to optimize concurrent computations. A concurrent environment gives rise to new optimization opportunities, like extracting a common concurrent subcomputation. To our knowledge,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-15 Henrich Lauko , Lukáš Korenčik , Peter Goodman

Understanding the Power of Evolutionary Computation for GPU Code Optimization

Achieving high performance for GPU codes requires developers to have significant knowledge in parallel programming and GPU architectures, and in-depth understanding of the application. This combination makes it challenging to find…

Software Engineering · Computer Science 2022-08-29 Jhe-Yu Liou , Muaaz Awan , Steven Hofmeyr , Stephanie Forrest , Carole-Jean Wu

Sparsity-Specific Code Optimization using Expression Trees

We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression…

Programming Languages · Computer Science 2022-03-15 Philipp Herholz , Xuan Tang , Teseo Schneider , Shoaib Kamil , Daniele Panozzo , Olga Sorkine-Hornung

Equality Saturation for Optimizing High-Level Julia IR

Compilers are indispensable for transforming code written in high-level languages into performant machine code, but their general-purpose optimizations sometimes fall short. Domain experts might be aware of certain optimizations that the…

Programming Languages · Computer Science 2025-07-15 Jules Merckx , Tim Besard , Bjorn De Sutter