English
Related papers

Related papers: DYNAMAP: Dynamic Algorithm Mapping Framework for L…

200 papers

Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks. To improve the performance and energy efficiency of CNN inference, the…

Hardware Architecture · Computer Science 2024-08-06 Rachmad Vidya Wicaksana Putra , Muhammad Abdullah Hanif , Muhammad Shafique

Real-time Deep Neural Network (DNN) inference with low-latency requirement has become increasingly important for numerous applications in both cloud computing (e.g., Apple's Siri) and edge computing (e.g., Google/Waymo's driverless car).…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-11 Weiwen Jiang , Edwin H. -M. Sha , Xinyi Zhang , Lei Yang , Qingfeng Zhuge , Yiyu Shi , Jingtong Hu

Among hardware accelerators for deep-learning inference, data flow implementations offer low latency and high throughput capabilities. In these architectures, each neuron is mapped to a dedicated hardware unit, making them well-suited for…

Machine Learning · Computer Science 2026-03-10 Tobias Habermann , Michael Mecik , Zhenyu Wang , César David Vera , Martin Kumm , Mario Garrido

Many convolutional neural network (CNN) accelerators face performance- and energy-efficiency challenges which are crucial for embedded implementations, due to high DRAM access latency and energy. Recently, some DRAM architectures have been…

Hardware Architecture · Computer Science 2023-03-06 Rachmad Vidya Wicaksana Putra , Muhammad Abdullah Hanif , Muhammad Shafique

Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application mapping, which determines the…

Hardware Architecture · Computer Science 2024-07-02 Xuan Wang , Minxuan Zhou , Tajana Rosing

In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made…

Neural and Evolutionary Computing · Computer Science 2022-09-26 Sadiq M. Sait , Aiman El-Maleh , Mohammad Altakrouri , Ahmad Shawahna

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Stylianos I. Venieris , Christos-Savvas Bouganis

Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-23 Andreas Bytyn , René Ahlsdorf , Rainer Leupers , Gerd Ascheid

Convolutional neural networks (CNNs) require both intensive computation and frequent memory access, which lead to a low processing speed and large power dissipation. Although the characteristics of the different layers in a CNN are…

Computer Vision and Pattern Recognition · Computer Science 2020-09-04 Duy Thanh Nguyen , Hyun Kim , Hyuk-Jae Lee

To achieve high accuracy, convolutional neural networks (CNNs) are increasingly growing in complexity and diversity in layer types and topologies. This makes it very challenging to efficiently deploy such networks on custom processor…

Systems and Control · Electrical Eng. & Systems 2024-06-21 Steven Colleman , Man Shi , Marian Verhelst

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature…

Hardware Architecture · Computer Science 2021-10-13 Zhuang Shao , Xiaoliang Chen , Li Du , Lei Chen , Yuan Du , Wei Zhuang , Huadong Wei , Chenjia Xie , Zhongfeng Wang

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been…

Machine Learning · Computer Science 2017-05-09 Xinyu Zhang , Srinjoy Das , Ojash Neopane , Ken Kreutz-Delgado

Designing and implementing efficient, provably correct parallel neural network processing is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads…

Machine Learning · Computer Science 2016-06-21 Maohua Zhu , Liu Liu , Chao Wang , Yuan Xie

Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or…

Hardware Architecture · Computer Science 2022-08-10 Cecilia Latotzke , Tim Ciesielski , Tobias Gemmeke

Dynamic inference is a feasible way to reduce the computational cost of convolutional neural network(CNN), which can dynamically adjust the computation for each input sample. One of the ways to achieve dynamic inference is to use…

Computer Vision and Pattern Recognition · Computer Science 2020-09-22 Zhihang Yuan , Xin Liu , Bingzhe Wu , Guangyu Sun

Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However,…

Hardware Architecture · Computer Science 2020-11-17 Lucian Petrica , Tobias Alonso , Mairin Kroes , Nicholas Fraser , Sorin Cotofana , Michaela Blott

This paper addresses the problem of very large-scale image retrieval, focusing on improving its accuracy and robustness. We target enhanced robustness of search to factors such as variations in illumination, object appearance and scale,…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Syed Sameed Husain , Miroslaw Bober

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e.,…

Hardware Architecture · Computer Science 2025-07-23 Jan Klhufek , Miroslav Safar , Vojtech Mrazek , Zdenek Vasicek , Lukas Sekanina

The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much. Hence, they lack a synergistic optimization of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-01 Muhammad Abdullah Hanif , Rachmad Vidya Wicaksana Putra , Muhammad Tanvir , Rehan Hafiz , Semeen Rehman , Muhammad Shafique
‹ Prev 1 2 3 10 Next ›