English
Related papers

Related papers: CFU Playground: Full-Stack Open-Source Framework f…

200 papers

We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware…

Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for…

This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the…

Machine Learning · Computer Science 2023-11-08 Zhiqiang Que , Shuo Liu , Markus Rognlien , Ce Guo , Jose G. F. Coutinho , Wayne Luk

In recent years, Convolutional Neural Network (CNN) based methods have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. However, CNN-based methods…

Machine Learning · Computer Science 2019-11-18 Ali Jahanshahi

The scientific community increasingly relies on machine learning (ML) for near-sensor processing, leveraging its strengths in tasks such as pattern recognition, anomaly detection, and real-time decision-making. These deployments demand…

Hardware Architecture · Computer Science 2026-03-30 G Abarajithan , Zhenghua Ma , Ravidu Munasinghe , Francesco Restuccia , Ryan Kastner

Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators…

Hardware Architecture · Computer Science 2015-09-02 Cheng Liu , Ho-Cheung Ng , Hayden Kwok-Hay So

The increase in open-source availability of Large Language Models (LLMs) has enabled users to deploy them on more and more resource-constrained edge devices to reduce reliance on network connections and provide more privacy. However, the…

Hardware Architecture · Computer Science 2024-08-02 Jude Haris , Rappy Saha , Wenhao Hu , José Cano

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors…

Mixed-precision neural networks (MPNNs) that enable the use of just enough data width for a deep learning task promise significant advantages of both inference accuracy and computing overhead. FPGAs with fine-grained reconfiguration…

Hardware Architecture · Computer Science 2023-08-23 Erjing Luo , Haitong Huang , Cheng Liu , Guoyu Li , Bing Yang , Ying Wang , Huawei Li , Xiaowei Li

Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains. However, their growing complexity has significantly increased computational demands, necessitating efficient hardware…

Machine Learning · Computer Science 2025-05-21 Junye Jiang , Yaan Zhou , Yuanhao Gong , Haoxuan Yuan , Shuanglong Liu

Embedded Field-Programmable Gate Arrays (eFPGAs) allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget compared with traditional FPGA platforms. However, the limited eFPGA logic and…

Hardware Architecture · Computer Science 2025-02-13 Tousif Rahman , Gang Mao , Bob Pattison , Sidharth Maheshwari , Marcos Sartori , Adrian Wheeldon , Rishad Shafik , Alex Yakovlev

The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools. We present PASTA, a low-overhead and modular Program AnalysiS Tool Framework for…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Mao Lin , Hyeran Jeon , Keren Zhou

Recent progress in artificial intelligence (AI) and high-performance computing (HPC) have brought potentially game-changing opportunities in accelerating reactive flow simulations. In this study, we introduce an open-source computational…

Computational Engineering, Finance, and Science · Computer Science 2023-12-22 Runze Mao , Yingrui Wang , Min Zhang , Han Li , Jiayang Xu , Xinyu Dong , Yan Zhang , Zhi X. Chen

Customized accelerators have revolutionized modern computing by delivering substantial gains in energy efficiency and performance through hardware specialization. Field-Programmable Gate Arrays (FPGAs) play a crucial role in this paradigm,…

Hardware Architecture · Computer Science 2025-09-25 Stéphane Pouget , Michael Lo , Louis-Noël Pouchet , Jason Cong

The paradigm shift towards local and on-device inference under stringent resource constraints is represented by the tiny machine learning (TinyML) domain. The primary goal of TinyML is to integrate intelligence into tiny, low-cost devices…

In this paper, we present the new FPGA EMUlation (FEMU), an open-source and configurable emulation framework for prototyping and evaluating TinyAI heterogeneous systems (HS). FEMU leverages the capability of system-on-chip (SoC)-based FPGAs…

FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers' productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full…

Hardware Architecture · Computer Science 2016-06-22 Ho-Cheung Ng , Cheng Liu , Hayden Kwok-Hay So

The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands.…

Hardware Architecture · Computer Science 2024-03-20 Hongwu Peng , Caiwen Ding , Tong Geng , Sutanay Choudhury , Kevin Barker , Ang Li

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

Large language models (LLMs) have demonstrated remarkable abilities in natural language processing. However, their deployment on resource-constrained embedded devices remains difficult due to memory and computational demands. In this paper,…

Hardware Architecture · Computer Science 2024-09-19 Han Xu , Yutong Li , Shihao Ji
‹ Prev 1 2 3 10 Next ›