English
Related papers

Related papers: Generating GPU Compiler Heuristics using Reinforce…

200 papers

Compiler optimization decisions are often based on hand-crafted heuristics centered around a few established benchmark suites. Alternatively, they can be learned from feature and performance data produced during compilation. However,…

Programming Languages · Computer Science 2022-06-29 Raphael Mosaner , David Leopoldseder , Wolfgang Kisling , Lukas Stadler , Hanspeter Mössenböck

Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-05 Richard Schoonhoven , Ben van Werkhoven , Kees Joost Batenburg

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-30 Robert V. Lim , Boyana Norris , Allen D. Malony

Combinatorial optimization is frequently used in computer vision. For instance, in applications like semantic segmentation, human pose estimation and action recognition, programs are formulated for solving inference in Conditional Random…

Computer Vision and Pattern Recognition · Computer Science 2020-05-06 Safa Messaoud , Maghav Kumar , Alexander G. Schwing

Efficient GPU programming is crucial for achieving high performance in deep learning (DL) applications. The performance of GPU programs depends on how data is parallelized across threads and arranged within memory subsystems. The mapping…

Machine Learning · Computer Science 2026-01-30 Xiao Zhang , Yaoyao Ding , Bolin Sun , Yang Hu , Tatiana Shpeisman , Gennady Pekhimenko

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-18 Saeed Taheri , Apan Qasem , Martin Burtscher

Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to…

Performance · Computer Science 2024-02-21 Weicheng Xue , Christohper John Roy

Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases. We establish theoretical foundations for three complementary approaches to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-15 Adilet Metinov , Gulida M. Kudakeeva , Gulnara D. Kabaeva

The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-11 Ajay Brahmakshatriya , Yunming Zhang , Changwan Hong , Shoaib Kamil , Julian Shun , Saman Amarasinghe

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler…

Programming Languages · Computer Science 2023-05-01 Mingxuan Zhu , Dan Hao , Junjie Chen

Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-20 Jiří Filipovič , Jana Hozzová , Amin Nezarat , Jaroslav Oľha , Filip Petrovič

There has been an increased interest in discovering heuristics for combinatorial problems on graphs through machine learning. While existing techniques have primarily focused on obtaining high-quality solutions, scalability to billion-sized…

Machine Learning · Computer Science 2020-12-04 Sahil Manchanda , Akash Mittal , Anuj Dhawan , Sourav Medya , Sayan Ranu , Ambuj Singh

Program synthesis is an umbrella term for generating programs and logical formulae from specifications. With the remarkable performance improvements that GPUs enable for deep learning, a natural question arose: can we also implement a…

Programming Languages · Computer Science 2025-04-29 Martin Berger , Nathanaël Fijalkow , Mojtaba Valizadeh

This report presents some early results on code generation targeting tensor cores on NVIDIA GPUs using the MLIR compiler infrastructure. The state-of-the-art in high-performance deep learning today is primarily driven by manually optimized…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-31 Navdeep Katel , Vivek Khandelwal , Uday Bondhugula

We present a deep reinforcement learning approach to minimizing the execution cost of neural network computation graphs in an optimizing compiler. Unlike earlier learning-based works that require training the optimizer on the same graph to…

Machine Learning · Computer Science 2020-02-11 Aditya Paliwal , Felix Gimeno , Vinod Nair , Yujia Li , Miles Lubin , Pushmeet Kohli , Oriol Vinyals

Compiler auto-tuning optimizes pass sequences to improve performance metrics such as Intermediate Representation (IR) instruction count. Although recent advances leveraging Large Language Models (LLMs) have shown promise in automating…

Machine Learning · Computer Science 2025-06-23 Haolin Pan , Hongyu Lin , Haoran Luo , Yang Liu , Kaichun Yao , Libo Zhang , Mingjie Xing , Yanjun Wu

We present a set of rules to guide the design of GPU algorithms. These rules are grounded on the principle of reducing waste in GPU utility to achieve good speed up. In accordance to these rules, we propose GPU algorithms for 2D…

Graphics · Computer Science 2020-07-02 Zhenghai Chen , Tiow-Seng Tan , Hong-Yang Ong

Estimating the probability with which a conditional branch instruction is taken is an important analysis that enables many optimizations in modern compilers. When using Profile Guided Optimizations (PGO), compilers are able to make a good…

Machine Learning · Computer Science 2022-02-17 Easwaran Raman , Xinliang David Li

Recent studies in using deep learning to solve routing problems focus on construction heuristics, the solutions of which are still far from optimality. Improvement heuristics have great potential to narrow this gap by iteratively refining a…

Artificial Intelligence · Computer Science 2020-05-12 Yaoxin Wu , Wen Song , Zhiguang Cao , Jie Zhang , Andrew Lim
‹ Prev 1 2 3 10 Next ›