Related papers: Codestitcher: Inter-Procedural Basic Block Layout …

Improved Basic Block Reordering

Basic block reordering is an important step for profile-guided binary optimization. The state-of-the-art goal for basic block reordering is to maximize the number of fall-through branches. However, we demonstrate that such orderings may…

Programming Languages · Computer Science 2020-04-14 Andy Newell , Sergey Pupyrev

Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode

Fast compilation is important when compilation occurs at runtime, such as query compilers in modern database systems and WebAssembly virtual machines in modern browsers. We present copy-and-patch, an extremely fast compilation technique…

Programming Languages · Computer Science 2021-09-16 Haoran Xu , Fredrik Kjolstad

Optimizing Function Layout for Mobile Applications

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance…

Programming Languages · Computer Science 2022-11-18 Ellis Hoag , Kyungwoo Lee , Julián Mestre , Sergey Pupyrev

Improving Readability of Scratch Programs with Search-based Refactoring

Block-based programming languages like Scratch have become increasingly popular as introductory languages for novices. These languages are intended to be used with a "tinkering" approach which allows learners and teachers to quickly…

Software Engineering · Computer Science 2021-08-17 Felix Adler , Gordon Fraser , Eva Gründinger , Nina Körber , Simon Labrenz , Jonas Lerchenberger , Stephan Lukasczyk , Sebastian Schweikl

FasterPy: An LLM-based Code Execution Efficiency Optimization Framework

Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant…

Software Engineering · Computer Science 2025-12-30 Yue Wu , Minghao Han , Ruiyin Li , Peng Liang , Amjed Tahir , Zengyang Li , Qiong Feng , Mojtaba Shahin

Coded Caching Schemes with Reduced Subpacketization from Linear Block Codes

Coded caching is a technique that generalizes conventional caching and promises significant reductions in traffic over caching networks. However, the basic coded caching scheme requires that each file hosted in the server be partitioned…

Information Theory · Computer Science 2018-02-20 Li Tang , Aditya Ramamoorthy

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an effective method for guiding LLM…

Computation and Language · Computer Science 2025-05-30 Yongchao Chen , Yilun Hao , Yueying Liu , Yang Zhang , Chuchu Fan

Block Shelves for Visual Programming Languages

The blocks editor, such as the editor in Scratch, is widely applied for visual programming languages (VPL) nowadays. Despite it's friendly for non-programmers, it exists three main limitations while displaying block codes: (1) the…

Human-Computer Interaction · Computer Science 2016-05-04 Sheng-yi Hsu , Yuan-fu Lou , Chuen-tsai Sun

PerfRL: A Small Language Model Framework for Efficient Code Optimization

Code optimization is a challenging task requiring a substantial level of expertise from developers. Nonetheless, this level of human capacity is not sufficient considering the rapid evolution of new hardware architectures and software…

Machine Learning · Computer Science 2025-03-11 Shukai Duan , Nikos Kanakaris , Xiongye Xiao , Heng Ping , Chenyu Zhou , Nesreen K. Ahmed , Guixiang Ma , Mihai Capota , Theodore L. Willke , Shahin Nazarian , Paul Bogdan

Hierarchical coded elastic computing

Elasticity is offered by cloud service providers to exploit under-utilized computing resources. The low-cost elastic nodes can leave and join any time during the computation cycle. The possibility of elastic events occurring together with…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-24 Shahrzad Kiani , Tharindu Adikari , Stark C. Draper

FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs

In recent years, there is a surge on machine learning applications in industry. Many of them are based on popular AI frameworks like Tensorflow, Torch, Caffe, or MxNet, etc, and are enpowered by accelerator platforms such as GPUs. One…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-14 Guoping Long , Jun Yang , Kai Zhu , Wei Lin

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

Performance optimization is the art of continuous seeking a harmonious mapping between the application domain and hardware. Recent years have witnessed a surge of deep learning (DL) applications in industry. Conventional wisdom for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-27 Guoping Long , Jun Yang , Wei Lin

Learning Linear Block Codes with Gradient Quantization

This study investigates the problem of learning linear block codes optimized for Belief-Propagation decoders significantly improving performance compared to the state-of-the-art. Our previous research is extended with an enhanced system…

Signal Processing · Electrical Eng. & Systems 2025-10-02 Louis-Adrien Dufrène , Quentin Lampin , Guillaume Larue

BOLT: A Practical Binary Optimizer for Data Centers and Beyond

Performance optimization for large-scale applications has recently become more important as computation continues to move towards data centers. Data-center applications are generally very large and complex, which makes code layout an…

Programming Languages · Computer Science 2018-10-16 Maksim Panchenko , Rafael Auler , Bill Nell , Guilherme Ottoni

Performance Characterization and Optimizations of Traditional ML Applications

Even in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization of these…

Performance · Computer Science 2024-12-30 Harsh Kumar , R. Govindarajan

Memory Hierarchy Sensitive Graph Layout

Mining large graphs for information is becoming an increasingly important workload due to the plethora of graph structured data becoming available. An aspect of graph algorithms that has hitherto not received much interest is the effect of…

Data Structures and Algorithms · Computer Science 2012-03-27 Amitabha Roy

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Modern high-performance architectures employ large last-level caches (LLCs). While large LLCs can reduce average memory access latency for workloads with a high degree of locality, they can also increase latency for workloads with irregular…

Hardware Architecture · Computer Science 2025-11-26 Hoa Nguyen , Pongstorn Maidee , Jason Lowe-Power , Alireza Kaviani

A Sequential Approximation Framework for Coded Distributed Optimization

Building on the previous work of Lee et al. and Ferdinand et al. on coded computation, we propose a sequential approximation framework for solving optimization problems in a distributed manner. In a distributed computation system, latency…

Information Theory · Computer Science 2017-10-26 Jingge Zhu , Ye Pu , Vipul Gupta , Claire Tomlin , Kannan Ramchandran

FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads

We show in this work that memory intensive computations can result in severe performance problems due to off-chip memory access and CPU-GPU context switch overheads in a wide range of deep learning models. For this problem, current…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-20 Zhen Zheng , Pengzhan Zhao , Guoping Long , Feiwen Zhu , Kai Zhu , Wenyi Zhao , Lansong Diao , Jun Yang , Wei Lin

Should AI Optimize Your Code? A Comparative Study of Classical Optimizing Compilers Versus Current Large Language Models

Traditional optimizing compilers have played an important role in adapting to the growing complexity of modern software systems. The need for efficient parallel programming in current architectures requires strong optimization techniques.…

Artificial Intelligence · Computer Science 2025-04-03 Miguel Romero Rosas , Miguel Torres Sanchez , Rudolf Eigenmann