Related papers: Autotuning PolyBench Benchmarks with LLVM Clang/Po…

Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their…

Machine Learning · Computer Science 2021-04-28 Xingfu Wu , Michael Kruse , Prasanna Balaprakash , Hal Finkel , Paul Hovland , Valerie Taylor , Mary Hall

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized…

Machine Learning · Computer Science 2025-04-09 Jacob O. Tørring , Carl Hvarfner , Luigi Nardi , Magnus Själander

A Performance Vocabulary for Affine Loop Transformations

Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-10 Martin Kong , Louis-Noël Pouchet

Compiler Auto-tuning through Multiple Phase Learning

Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler…

Programming Languages · Computer Science 2023-05-01 Mingxuan Zhu , Dan Hao , Junjie Chen

Autotuning Search Space for Loop Transformations

One of the challenges for optimizing compilers is to predict whether applying an optimization will improve its execution speed. Programmers may override the compiler's profitability heuristic using optimization directives such as pragmas in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Michael Kruse , Hal Finkel , Xingfu Wu

Benchmark of Bayesian Optimization and Metaheuristics for Control Engineering Tuning Problems with Crash Constraints

Controller tuning based on black-box optimization allows to automatically tune performance-critical parameters w.r.t. mostly arbitrary high-level closed-loop control objectives. However, a comprehensive benchmark of different black-box…

Systems and Control · Electrical Eng. & Systems 2022-11-07 David Stenger , Dirk Abel

Benchmarking optimization algorithms for auto-tuning GPU kernels

Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-05 Richard Schoonhoven , Ben van Werkhoven , Kees Joost Batenburg

Bayesian Optimization for auto-tuning GPU kernels

Finding optimal parameter configurations for tunable GPU kernels is a non-trivial exercise for large search spaces, even when automated. This poses an optimization task on a non-convex search space, using an expensive to evaluate function…

Machine Learning · Computer Science 2021-12-01 Floris-Jan Willemsen , Rob van Nieuwpoort , Ben van Werkhoven

Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller

Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where…

Robotics · Computer Science 2026-05-28 Zhewen Zheng , Wenjing Cao , Hongkang Yu , Mo Chen , Takashi Suzuki

Automated Algorithm Design for Auto-Tuning Optimizers

Automatic performance tuning (auto-tuning) is essential for optimizing high-performance applications, where vast and irregular search spaces make manual exploration infeasible. While auto-tuners traditionally rely on classical approaches…

Machine Learning · Computer Science 2026-04-01 Floris-Jan Willemsen , Niki van Stein , Ben van Werkhoven

Improving Computational Cost of Bayesian Optimization for Controller Tuning with a Multi-stage Tuning Framework

Control auto-tuning for industrial and robotic systems, when framed as an optimization problem, provides an excellent means to tune these systems. However, most optimization methods are computationally costly, and this is problematic for…

Computational Engineering, Finance, and Science · Computer Science 2024-11-11 Marlon J. Ares-Milian , Gregory Provan , Marcos Quinones-Grueiro

Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations

Polly is the LLVM project's polyhedral loop nest optimizer. Recently, user-directed loop transformation pragmas were proposed based on LLVM/Clang and Polly. The search space exposed by the transformation pragmas is a tree, wherein each node…

Programming Languages · Computer Science 2021-05-12 Jaehoon Koo , Prasanna Balaprakash , Michael Kruse , Xingfu Wu , Paul Hovland , Mary Hall

Data-efficient Auto-tuning with Bayesian Optimization: An Industrial Control Study

Bayesian optimization is proposed for automatic learning of optimal controller parameters from experimental data. A probabilistic description (a Gaussian process) is used to model the unknown function from controller parameters to a…

Systems and Control · Computer Science 2019-01-24 Matthias Neumann-Brosig , Alonso Marco , Dieter Schwarzmann , Sebastian Trimpe

Efficient Parameter Tuning for a Structure-Based Virtual Screening HPC Application

Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-22 Bruno Guindani , Davide Gadioli , Roberto Rocco , Danilo Ardagna , Gianluca Palermo

Using hardware performance counters to speed up autotuning convergence on GPUs

Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-20 Jiří Filipovič , Jana Hozzová , Amin Nezarat , Jaroslav Oľha , Filip Petrovič

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Thomas L. Falch , Anne C. Elster

Autotuning Benchmarking Techniques: A Roofline Model Case Study

Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Jacob Odgård Tørring , Jan Christian Meyer , Anne C. Elster

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-30 Xingfu Wu , Prasanna Balaprakash , Michael Kruse , Jaehoon Koo , Brice Videau , Paul Hovland , Valerie Taylor , Brad Geltz , Siddhartha Jana , Mary Hall

Analyzing Search Techniques for Autotuning Image-based GPU Kernels: The Impact of Sample Sizes

Modern computing systems are increasingly more complex, with their multicore CPUs and GPUs accelerators changing yearly, if not more often. It thus has become very challenging to write programs that efficiently use the associated complex…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-28 Jacob O. Tørring , Anne C. Elster

A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit

Autotuning of performance-relevant source-code parameters allows to automatically tune applications without hard coding optimizations and thus helps with keeping the performance portable. In this paper, we introduce a benchmark set of ten…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-02 Filip Petrovič , David Střelák , Jana Hozzová , Jaroslav Oľha , Richard Trembecký , Siegfried Benkner , Jiří Filipovič