Related papers: Learning Compiler Pass Orders using Coreset and No…
Compiler optimization relies on sequences of passes to improve program performance. Selecting and ordering these passes automatically, known as compiler auto-tuning, is challenging due to the large and complex search space. Existing…
Compiler pass auto-tuning is critical for enhancing software performance, yet finding the optimal pass sequence for a specific program is an NP-hard problem. Traditional, general-purpose optimization flags like -O3 and -Oz adopt a…
The ever increasing memory requirements of several applications has led to increased demands which might not be met by embedded devices. Constraining the usage of memory in such cases is of paramount importance. It is important that such…
Compiler pass selection and phase ordering present a significant challenge in achieving optimal program performance, particularly for objectives like code size reduction. Standard compiler heuristics offer general applicability but often…
Compiler writers typically focus primarily on the performance of the generated program binaries when selecting the passes and the order in which they are applied in the standard optimization levels, such as GCC -O3. In some domains, such as…
Compiler phase ordering has a strong effect on program performance. Finding an effective sequence of passes is still a difficult task because the search space is large and execution time, code size and energy consumption often conflict.…
Instruction combiner (IC) is a critical compiler optimization pass, which replaces a sequence of instructions with an equivalent and optimized instruction sequence at basic block level. There can be thousands of instruction-combining…
Selecting the right compiler optimisations has a severe impact on programs' performance. Still, the available optimisations keep increasing, and their effect depends on the specific program, making the task human intractable. Researchers…
Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as parameter count or FLOPs do not reliably predict wall-clock inference time. In particular,…
The performance of the code a compiler generates depends on the order in which it applies the optimization passes. Choosing a good order--often referred to as the phase-ordering problem, is an NP-hard problem. As a result, existing…
Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code…
Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer…
Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler…
This paper describes learning in a compiler for algorithms solving classes of the logic minimization problem MINSAT, where the underlying propositional formula is in conjunctive normal form (CNF) and where costs are associated with the…
Compiler optimization decisions are often based on hand-crafted heuristics centered around a few established benchmark suites. Alternatively, they can be learned from feature and performance data produced during compilation. However,…
Large language models (LLMs) remain brittle in multi-step structured workflows, where errors compound across sequential transformations, validation stages, and stateful operations such as SQL persistence. We present PlanCompiler, a…
The rapid growth of deep learning models has increased the demand for efficient distributed training strategies. Fully sharded approaches like ZeRO-3 and FSDP partition model parameters across GPUs and apply optimizations such as…
Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically…
With the increasing capabilities of quantum systems, the efficient, practical execution of quantum programs is becoming more critical. Each execution includes compilation time, which accounts for substantial overhead of the overall program…
The advancement of machine learning for compiler optimization, particularly within the polyhedral model, is constrained by the scarcity of large-scale, public performance datasets. This data bottleneck forces researchers to undertake costly…