Related papers: CFU Playground: Full-Stack Open-Source Framework f…

Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark

We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware…

Machine Learning · Computer Science 2022-06-24 Hendrik Borras , Giuseppe Di Guglielmo , Javier Duarte , Nicolò Ghielmetti , Ben Hawks , Scott Hauck , Shih-Chieh Hsu , Ryan Kastner , Jason Liang , Andres Meza , Jules Muhizi , Tai Nguyen , Rushil Roy , Nhan Tran , Yaman Umuroglu , Olivia Weng , Aidan Yokuda , Michaela Blott

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for…

Machine Learning · Computer Science 2023-08-24 Hadi Esmaeilzadeh , Soroush Ghodrati , Andrew B. Kahng , Joon Kyung Kim , Sean Kinzer , Sayak Kundu , Rohan Mahapatra , Susmita Dey Manasi , Sachin Sapatnekar , Zhiang Wang , Ziqing Zeng

MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration

This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the…

Machine Learning · Computer Science 2023-11-08 Zhiqiang Que , Shuo Liu , Markus Rognlien , Ce Guo , Jose G. F. Coutinho , Wayne Luk

TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA

In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods…

Machine Learning · Computer Science 2019-11-18 Ali Jahanshahi

CGRA4ML: A Hardware/Software Framework to Implement Neural Networks for Scientific Edge Computing

The scientific community increasingly relies on machine learning (ML) for near-sensor processing, leveraging its strengths in tasks such as pattern recognition, anomaly detection, and real-time decision-making. These deployments demand…

Hardware Architecture · Computer Science 2026-03-30 G Abarajithan , Zhenghua Ma , Ravidu Munasinghe , Francesco Restuccia , Ryan Kastner

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators…

Hardware Architecture · Computer Science 2015-09-02 Cheng Liu , Ho-Cheung Ng , Hayden Kwok-Hay So

Designing Efficient LLM Accelerators for Edge Devices

The increase in open-source availability of Large Language Models (LLMs) has enabled users to deploy them on more and more resource-constrained edge devices to reduce reliance on network connections and provide more privacy. However, the…

Hardware Architecture · Computer Science 2024-08-02 Jude Haris , Rappy Saha , Wenhao Hu , José Cano

TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors…

Machine Learning · Computer Science 2021-03-16 Robert David , Jared Duke , Advait Jain , Vijay Janapa Reddi , Nat Jeffries , Jian Li , Nick Kreeger , Ian Nappier , Meghna Natraj , Shlomi Regev , Rocky Rhodes , Tiezhen Wang , Pete Warden

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

Mixed-precision neural networks (MPNNs) that enable the use of just enough data width for a deep learning task promise significant advantages of both inference accuracy and computing overhead. FPGAs with fine-grained reconfiguration…

Hardware Architecture · Computer Science 2023-08-23 Erjing Luo , Haitong Huang , Cheng Liu , Guoyu Li , Bing Yang , Ying Wang , Huawei Li , Xiaowei Li

FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review

Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains. However, their growing complexity has significantly increased computational demands, necessitating efficient hardware…

Machine Learning · Computer Science 2025-05-21 Junye Jiang , Yaan Zhou , Yuanhao Gong , Haoxuan Yuan , Shuanglong Liu

Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs

Embedded Field-Programmable Gate Arrays (eFPGAs) allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget compared with traditional FPGA platforms. However, the limited eFPGA logic and…

Hardware Architecture · Computer Science 2025-02-13 Tousif Rahman , Gang Mao , Bob Pattison , Sidharth Maheshwari , Marcos Sartori , Adrian Wheeldon , Rishad Shafik , Alex Yakovlev

PASTA: A Modular Program Analysis Tool Framework for Accelerators

The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools. We present PASTA, a low-overhead and modular Program AnalysiS Tool Framework for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Mao Lin , Hyeran Jeon , Keren Zhou

An integrated framework for accelerating reactive flow simulation using GPU and machine learning models

Recent progress in artificial intelligence (AI) and high-performance computing (HPC) have brought potentially game-changing opportunities in accelerating reactive flow simulations. In this study, we introduce an open-source computational…

Computational Engineering, Finance, and Science · Computer Science 2023-12-22 Runze Mao , Yingrui Wang , Min Zhang , Han Li , Jiayang Xu , Xinyu Dong , Yan Zhang , Zhi X. Chen

Holistic Optimization Framework for FPGA Accelerators

Customized accelerators have revolutionized modern computing by delivering substantial gains in energy efficiency and performance through hardware specialization. Field-Programmable Gate Arrays (FPGAs) play a crucial role in this paradigm,…

Hardware Architecture · Computer Science 2025-09-25 Stéphane Pouget , Michael Lo , Louis-Noël Pouchet , Jason Cong

Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

The paradigm shift towards local and on-device inference under stringent resource constraints is represented by the tiny machine learning (TinyML) domain. The primary goal of TinyML is to integrate intelligence into tiny, low-cost devices…

Hardware Architecture · Computer Science 2026-04-24 José Juan Hernández Morales , Georgios Mentzos , Frank Hannig , Konstantinos Balaskas , Georgios Zervakis , Jörg Henkel , Jürgen Teich

Invited Paper: FEMU: An Open-Source and Configurable Emulation Framework for Prototyping TinyAI Heterogeneous Systems

In this paper, we present the new FPGA EMUlation (FEMU), an open-source and configurable emulation framework for prototyping and evaluating TinyAI heterogeneous systems (HS). FEMU leverages the capability of system-on-chip (SoC)-based FPGAs…

Hardware Architecture · Computer Science 2025-08-26 Simone Machetti , Deniz Kasap , Juan Sapriza , Rubén Rodríguez Álvarez , Hossein Taji , José Miranda , Miguel Peón-Quirós , David Atienza

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator

FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers' productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full…

Hardware Architecture · Computer Science 2016-06-22 Ho-Cheung Ng , Cheng Liu , Hayden Kwok-Hay So

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands.…

Hardware Architecture · Computer Science 2024-03-20 Hongwu Peng , Caiwen Ding , Tong Geng , Sutanay Choudhury , Kevin Barker , Ang Li

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs

Large language models (LLMs) have demonstrated remarkable abilities in natural language processing. However, their deployment on resource-constrained embedded devices remains difficult due to memory and computational demands. In this paper,…

Hardware Architecture · Computer Science 2024-09-19 Han Xu , Yutong Li , Shihao Ji