Related papers: An Improving Method for Loop Unrolling

Loop Unrolling in Multi-pipeline ASIP Design

Application Specific Instruction-set Processor (ASIP) is one of the popular processor design techniques for embedded systems which allows customizability in processor design without overly hindering design flexibility. Multi-pipeline ASIPs…

Programming Languages · Computer Science 2014-02-05 Rajitha Navarathna , Swarnalatha Radhakrishnan , Roshan Ragel

Inner Loop Optimizations in Mapping Single Threaded Programs to Hardware

In the context of mapping high-level algorithms to hardware, we consider the basic problem of generating an efficient hardware implementation of a single threaded program, in particular, that of an inner loop. We describe a control-flow…

Hardware Architecture · Computer Science 2014-11-05 Madhav Desai

A Proposal for Loop-Transformation Pragmas

Pragmas for loop transformations, such as unrolling, are implemented in most mainstream compilers. They are used by application programmers because of their ease of use compared to directly modifying the source code of the relevant loops.…

Programming Languages · Computer Science 2019-01-31 Michael Kruse , Hal Finkel

Super-Linear Speedup by Generalizing Runtime Repeated Recursion Unfolding in Prolog

Runtime repeated recursion unfolding was recently introduced as a just-in-time program transformation strategy that can achieve super-linear speedup. So far, the method was restricted to single linear direct recursive rules in the…

Programming Languages · Computer Science 2025-03-14 Thom Fruehwirth

Energy-Efficiency Evaluation of OpenMP Loop Transformations and Runtime Constructs

OpenMP is the de facto API for parallel programming in HPC applications. These programs are often computed in data centers, where energy consumption is a major issue. Whereas previous work has focused almost entirely on performance, we here…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 Henrik Valter , Axel Karlsson , Miquel Pericàs

Repeated Recursion Unfolding for Super-Linear Speedup within Bounds

Repeated recursion unfolding is a new approach that repeatedly unfolds a recursion with itself and simplifies it while keeping all unfolded rules. Each unfolding doubles the number of recursive steps covered. This reduces the number of…

Programming Languages · Computer Science 2020-09-14 Thom Fruehwirth

ACPO: AI-Enabled Compiler Framework

The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this…

Programming Languages · Computer Science 2025-01-15 Amir H. Ashouri , Muhammad Asif Manzoor , Duc Minh Vu , Raymond Zhang , Colin Toft , Ziwen Wang , Angel Zhang , Bryan Chan , Tomasz S. Czajkowski , Yaoqing Gao

Threads and Or-Parallelism Unified

One of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-07-27 Vítor Santos Costa , Inês Dutra , Ricardo Rocha

Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem Solver

In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. Classical optimization-based techniques proceed by optimizing a data consistency metric together with…

Image and Video Processing · Electrical Eng. & Systems 2022-10-17 Peimeng Guan , Jihui Jin , Justin Romberg , Mark A. Davenport

Unrolled and Pipelined Decoders based on Look-Up Tables for Polar Codes

Unrolling a decoding algorithm allows to achieve extremely high throughput at the cost of increased area. Look-up tables (LUTs) can be used to replace functions otherwise implemented as circuits. In this work, we show the impact of…

Information Theory · Computer Science 2024-09-04 Pascal Giard , Syed Aizaz Ali Shah , Alexios Balatsoukas-Stimming , Maximilian Stark , Gerhard Bauch

Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing

Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature,…

Image and Video Processing · Electrical Eng. & Systems 2020-08-10 Vishal Monga , Yuelong Li , Yonina C. Eldar

Compiler Phase Ordering as an Orthogonal Approach for Reducing Energy Consumption

Compiler writers typically focus primarily on the performance of the generated program binaries when selecting the passes and the order in which they are applied in the standard optimization levels, such as GCC -O3. In some domains, such as…

Performance · Computer Science 2018-07-03 Ricardo Nobre , Luís Reis , João M. P. Cardoso

Learning to Reformulate for Linear Programming

It has been verified that the linear programming (LP) is able to formulate many real-life optimization problems, which can obtain the optimum by resorting to corresponding solvers such as OptVerse, Gurobi and CPLEX. In the past decades, a…

Optimization and Control · Mathematics 2022-01-19 Xijun Li , Qingyu Qu , Fangzhou Zhu , Jia Zeng , Mingxuan Yuan , Kun Mao , Jie Wang

A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

The design of general purpose processors relies heavily on a workload gathering step in which representative programs are collected from various application domains. Processor performance, when running the workload set, is profiled using…

Performance · Computer Science 2018-01-05 Elie M. Shaccour , Mohammad M. Mansour

Distributed Scheduling of Quantum Circuits with Noise and Time Optimization

Quantum computers are currently noisy, particularly without error correction and fault tolerance. Methods like error suppression and mitigation are widely used to improve performance. Circuit cutting, which partitions a circuit into smaller…

Quantum Physics · Physics 2025-07-03 Debasmita Bhoumik , Ritajit Majumdar , Amit Saha , Susmita Sur-Kolay

Enhancing the performance of Decoupled Software Pipeline through Backward Slicing

The rapidly increasing number of cores available in multicore processors does not necessarily lead directly to a commensurate increase in performance: programs written in conventional languages, such as C, need careful restructuring,…

Programming Languages · Computer Science 2015-01-28 Esraa Alwan , John Fitch , Julian Padget

Learning to Make Compiler Optimizations More Effective

Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer…

Programming Languages · Computer Science 2021-03-01 Rahim Mammadli , Marija Selakovic , Felix Wolf , Michael Pradel

Intelligent-Unrolling: Exploiting Regular Patterns in Irregular Applications

Modern optimizing compilers are able to exploit memory access or computation patterns to generate vectorization codes. However, such patterns in irregular applications are unknown until runtime due to the input dependence. Thus, either…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-26 Changxi Liu , Hailong Yang , Xu Liu , Zhongzhi Luan , Depei Qian

Impact of parallel code optimization on computer power consumption

The increase in performance and power of computing systems requires the wider use of program optimizations. The goal of performing optimizations is not only to reduce program runtime, but also to reduce other computer resources including…

Mathematical Software · Computer Science 2023-12-07 E. A. Kiselev , P. N. Telegin , A. V. Baranov

A Novel Loop Fission Technique Inspired by Implicit Computational Complexity

This work explores an unexpected application of Implicit Computational Complexity (ICC) to parallelize loops in imperative programs. Thanks to a lightweight dependency analysis, our algorithm allows splitting a loop into multiple loops that…

Programming Languages · Computer Science 2022-06-20 Clément Aubert , Thomas Rubiano , Neea Rusch , Thomas Seiller