Related papers: Generating GPU Compiler Heuristics using Reinforce…
Compiler optimization decisions are often based on hand-crafted heuristics centered around a few established benchmark suites. Alternatively, they can be learned from feature and performance data produced during compilation. However,…
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU…
Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good…
Combinatorial optimization is frequently used in computer vision. For instance, in applications like semantic segmentation, human pose estimation and action recognition, programs are formulated for solving inference in Conditional Random…
Efficient GPU programming is crucial for achieving high performance in deep learning (DL) applications. The performance of GPU programs depends on how data is parallelized across threads and arranged within memory subsystems. The mapping…
Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…
Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to…
Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases. We establish theoretical foundations for three complementary approaches to…
The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all…
Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…
Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler…
Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of…
There has been an increased interest in discovering heuristics for combinatorial problems on graphs through machine learning. While existing techniques have primarily focused on obtaining high-quality solutions, scalability to billion-sized…
Program synthesis is an umbrella term for generating programs and logical formulae from specifications. With the remarkable performance improvements that GPUs enable for deep learning, a natural question arose: can we also implement a…
This report presents some early results on code generation targeting tensor cores on NVIDIA GPUs using the MLIR compiler infrastructure. The state-of-the-art in high-performance deep learning today is primarily driven by manually optimized…
We present a deep reinforcement learning approach to minimizing the execution cost of neural network computation graphs in an optimizing compiler. Unlike earlier learning-based works that require training the optimizer on the same graph to…
Compiler auto-tuning optimizes pass sequences to improve performance metrics such as Intermediate Representation (IR) instruction count. Although recent advances leveraging Large Language Models (LLMs) have shown promise in automating…
We present a set of rules to guide the design of GPU algorithms. These rules are grounded on the principle of reducing waste in GPU utility to achieve good speed up. In accordance to these rules, we propose GPU algorithms for 2D…
Estimating the probability with which a conditional branch instruction is taken is an important analysis that enables many optimizations in modern compilers. When using Profile Guided Optimizations (PGO), compilers are able to make a good…
Recent studies in using deep learning to solve routing problems focus on construction heuristics, the solutions of which are still far from optimality. Improvement heuristics have great potential to narrow this gap by iteratively refining a…