Related papers: Compiling Halide Programs to Push-Memory Accelerat…

ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators

Image processing algorithms are prime targets for hardware acceleration as they are commonly used in resource- and power-limited applications. Today's image processing accelerator designs make rigid assumptions about the algorithm…

Hardware Architecture · Computer Science 2023-04-10 Nisarg Ujjainkar , Jingwen Leng , Yuhao Zhu

Programming Heterogeneous Systems from an Image Processing DSL

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating,…

Software Engineering · Computer Science 2016-11-01 Jing Pu , Steven Bell , Xuan Yang , Jeff Setter , Stephen Richardson , Jonathan Ragan-Kelley , Mark Horowitz

Compiling Neural Networks for a Computational Memory Accelerator

Computational memory (CM) is a promising approach for accelerating inference on neural networks (NN) by using enhanced memories that, in addition to storing data, allow computations on them. One of the main challenges of this approach is…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-27 Kornilios Kourtis , Martino Dazzi , Nikolas Ioannou , Tobias Grosser , Abu Sebastian , Evangelos Eleftheriou

Compiler Infrastructure for Specializing Domain-Specific Memory Templates

Specialized hardware accelerators are becoming important for more and more applications. Thanks to specialization, they can achieve high performance and energy efficiency but their design is complex and time consuming. This problem is…

Hardware Architecture · Computer Science 2021-04-06 Stephanie Soldavini , Christian Pilato

Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language

Tensor accelerators now represent a growing share of compute resources in modern CPUs and GPUs. However, they are hard to program, leading developers to use vendor-provided kernel libraries that support tensor accelerators. As a result, the…

Programming Languages · Computer Science 2026-02-12 Yihong Zhang , Derek Gerstmann , Andrew Adams , Maaz Bin Safeer Ahmad

Categorization of Program Regions for Agile Compilation using Machine Learning and Hardware Support

A compiler processes the code written in a high level language and produces machine executable code. The compiler writers often face the challenge of keeping the compilation times reasonable. That is because aggressive optimization passes…

Programming Languages · Computer Science 2019-05-30 Sanket Tavarageri

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size,…

Hardware Architecture · Computer Science 2024-05-09 Songyun Qu , Shixin Zhao , Bing Li , Yintao He , Xuyi Cai , Lei Zhang , Ying Wang

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and practical deployment scenarios. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-30 Alireza Furutanpey , Carmen Walser , Philipp Raith , Pantelis A. Frangoudis , Schahram Dustdar

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

AccD: A Compiler-based Framework for Accelerating Distance-related Algorithms on CPU-FPGA Platforms

As a promising solution to boost the performance of distance-related algorithms (e.g., K-means and KNN), FPGA-based acceleration attracts lots of attention, but also comes with numerous challenges. In this work, we propose AccD, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Yuke Wang , Boyuan Feng , Gushu Li , Lei Deng , Yuan Xie , Yufei Ding

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

The emergence of machine learning, image and audio processing on edge devices has motivated research towards power efficient custom hardware accelerators. Though FPGAs are an ideal target for energy efficient custom accelerators, the…

Hardware Architecture · Computer Science 2021-03-02 Kingshuk Majumder , Uday Bondhugula

A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…

Machine Learning · Computer Science 2025-07-08 Samira Ahmadifarsani , Daniel Mueller-Gritschneder , Ulf Schlichtmann

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler

Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained…

Neural and Evolutionary Computing · Computer Science 2018-01-19 Yu Ji , YouHui Zhang , WenGuang Chen , Yuan Xie

A Unified Programming Model for Heterogeneous Computing with CPU and Accelerator Technologies

This paper consists of three parts. The first part provides a unified programming model for heterogeneous computing with CPU and accelerator (like GPU, FPGA, Google TPU, Atos QPU, and more) technologies. To some extent, this new programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-31 Yuqing Xiong

DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration

Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-18 Mohamed S. Abdelfattah , David Han , Andrew Bitar , Roberto DiCecco , Shane OConnell , Nitika Shanker , Joseph Chu , Ian Prins , Joshua Fender , Andrew C. Ling , Gordon R. Chiu

Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers

Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices across diverse applications. However,…

Graphics · Computer Science 2025-04-01 Chaojian Li , Sixu Li , Linrui Jiang , Jingqun Zhang , Yingyan Celine Lin

Towards High Performance, Portability, and Productivity: Lightweight Augmented Neural Networks for Performance Prediction

Writing high-performance code requires significant expertise in the programming language, compiler optimizations, and hardware knowledge. This often leads to poor productivity and portability and is inconvenient for a non-programmer…

Performance · Computer Science 2020-09-01 Ajitesh Srivastava , Naifeng Zhang , Rajgopal Kannan , Viktor K. Prasanna