Related papers: DYNAMAP: Dynamic Algorithm Mapping Framework for L…

PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy

Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks. To improve the performance and energy efficiency of CNN inference, the…

Hardware Architecture · Computer Science 2024-08-06 Rachmad Vidya Wicaksana Putra , Muhammad Abdullah Hanif , Muhammad Shafique

Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

Real-time Deep Neural Network (DNN) inference with low-latency requirement has become increasingly important for numerous applications in both cloud computing (e.g., Apple's Siri) and edge computing (e.g., Google/Waymo's driverless car).…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-11 Weiwen Jiang , Edwin H. -M. Sha , Xinyi Zhang , Lei Yang , Qingfeng Zhuge , Yiyu Shi , Jingtong Hu

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

Among hardware accelerators for deep-learning inference, data flow implementations offer low latency and high throughput capabilities. In these architectures, each neuron is mapped to a dedicated hardware unit, making them well-suited for…

Machine Learning · Computer Science 2026-03-10 Tobias Habermann , Michael Mecik , Zhenyu Wang , César David Vera , Martin Kumm , Mario Garrido

DRMap: A Generic DRAM Data Mapping Policy for Energy-Efficient Processing of Convolutional Neural Networks

Many convolutional neural network (CNN) accelerators face performance- and energy-efficiency challenges which are crucial for embedded implementations, due to high DRAM access latency and energy. Recently, some DRAM architectures have been…

Hardware Architecture · Computer Science 2023-03-06 Rachmad Vidya Wicaksana Putra , Muhammad Abdullah Hanif , Muhammad Shafique

Fast-OverlaPIM: A Fast Overlap-driven Mapping Framework for Processing In-Memory Neural Network Acceleration

Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application mapping, which determines the…

Hardware Architecture · Computer Science 2024-07-02 Xuan Wang , Minxuan Zhou , Tajana Rosing

Optimization of FPGA-based CNN Accelerators Using Metaheuristics

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made…

Neural and Evolutionary Computing · Computer Science 2022-09-26 Sadiq M. Sait , Aiman El-Maleh , Mohammad Altakrouri , Ahmad Shawahna

f-CNN$^{\text{x}}$: A Toolflow for Mapping Multi-CNN Applications on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Stylianos I. Venieris , Christos-Savvas Bouganis

Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect

Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-23 Andreas Bytyn , René Ahlsdorf , Rainer Leupers , Gerd Ascheid

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are…

Computer Vision and Pattern Recognition · Computer Science 2020-09-04 Duy Thanh Nguyen , Hyun Kim , Hyuk-Jae Lee

COAC: Cross-layer Optimization of Accelerator Configurability for Efficient CNN Processing

To achieve high accuracy, convolutional neural networks (CNNs) are increasingly growing in complexity and diversity in layer types and topologies. This makes it very challenging to efficiently deploy such networks on custom processor…

Systems and Control · Electrical Eng. & Systems 2024-06-21 Steven Colleman , Man Shi , Marian Verhelst

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature…

Hardware Architecture · Computer Science 2021-10-13 Zhuang Shao , Xiaoliang Chen , Li Du , Lei Chen , Yuan Du , Wei Zhuang , Huadong Wei , Chenjia Xie , Zhongfeng Wang

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been…

Machine Learning · Computer Science 2017-05-09 Xinyu Zhang , Srinjoy Das , Ojash Neopane , Ken Kreutz-Delgado

CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis

Designing and implementing efficient, provably correct parallel neural network processing is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads…

Machine Learning · Computer Science 2016-06-21 Maohua Zhu , Liu Liu , Chao Wang , Yuan Xie

Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA

Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or…

Hardware Architecture · Computer Science 2022-08-10 Cecilia Latotzke , Tim Ciesielski , Tobias Gemmeke

ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference

Dynamic inference is a feasible way to reduce the computational cost of convolutional neural network(CNN), which can dynamically adjust the computation for each input sample. One of the ways to achieve dynamic inference is to use…

Computer Vision and Pattern Recognition · Computer Science 2020-09-22 Zhihang Yuan , Xin Liu , Bingzhe Wu , Guangyu Sun

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However,…

Hardware Architecture · Computer Science 2020-11-17 Lucian Petrica , Tobias Alonso , Mairin Kroes , Nicholas Fraser , Sorin Cotofana , Michaela Blott

REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval

This paper addresses the problem of very large-scale image retrieval, focusing on improving its accuracy and robustness. We target enhanced robustness of search to factors such as variations in illumination, object appearance and scale,…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Syed Sameed Husain , Miroslaw Bober

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e.,…

Hardware Architecture · Computer Science 2025-07-23 Jan Klhufek , Miroslav Safar , Vojtech Mrazek , Zdenek Vasicek , Lukas Sekanina

MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks

The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much. Hence, they lack a synergistic optimization of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-01 Muhammad Abdullah Hanif , Rachmad Vidya Wicaksana Putra , Muhammad Tanvir , Rehan Hafiz , Semeen Rehman , Muhammad Shafique