Related papers: LOCAL: Low-Complex Mapping Algorithm for Spatial D…

The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design

The energy and latency of an accelerator running a deep neural network (DNN) depend on how the computation and data movement are scheduled in the accelerator (i.e., mapping), and picking an optimal mapping is essential to achieve…

Hardware Architecture · Computer Science 2026-05-05 Michael Gilbert , Tanner Andrulis , Vivienne Sze , Joel S. Emer

Local Optimization of MAPF solutions on Directed Graphs

Among sub-optimal MAPF solvers, rule-based algorithms are particularly appealing since they are complete. Even in crowded scenarios, they allow finding a feasible solution that brings each agent to its target, preventing deadlock…

Optimization and Control · Mathematics 2024-04-10 S. Ardizzoni , I. Saccani , L. Consolini , M. Locatelli

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-23 Ye Yu , Yingmin Li , Shuai Che , Niraj K. Jha , Weifeng Zhang

Dynamic Programming based Local Search approaches for Multi-Agent Path Finding problems on Directed Graphs

Among sub-optimal Multi-Agent Path Finding (MAPF) solvers, rule-based algorithms are particularly appealing since they are complete. Even in crowded scenarios, they allow finding a feasible solution that brings each agent to its target,…

Multiagent Systems · Computer Science 2024-10-11 Irene Saccani , Stefano Ardizzoni , Luca Consolini , Marco Locatelli

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

Learning from Experience for Rapid Generation of Local Car Maneuvers

Being able to rapidly respond to the changing scenes and traffic situations by generating feasible local paths is of pivotal importance for car autonomy. We propose to train a deep neural network (DNN) to plan feasible and nearly-optimal…

Robotics · Computer Science 2023-01-26 Piotr Kicki , Tomasz Gawron , Krzysztof Ćwian , Mete Ozay , Piotr Skrzypczyński

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC…

Hardware Architecture · Computer Science 2023-12-07 Abinand Nallathambi , Christin David Bose , Wilfried Haensch , Anand Raghunathan

MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration

This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the…

Machine Learning · Computer Science 2023-11-08 Zhiqiang Que , Shuo Liu , Markus Rognlien , Ce Guo , Jose G. F. Coutinho , Wayne Luk

Demystifying Map Space Exploration for NPUs

Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator. It is known to be extremely computationally expensive, and there has been active research looking at both heuristics…

Machine Learning · Computer Science 2022-10-10 Sheng-Chun Kao , Angshuman Parashar , Po-An Tsai , Tushar Krishna

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-12 Zihan Liu , Jingwen Leng , Quan Chen , Chao Li , Wenli Zheng , Li Li , Minyi Guo

Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Deep Neural Networks (DNNs) are being heavily utilized in modern applications and are putting energy-constraint devices to the test. To bypass high energy consumption issues, approximate computing has been employed in DNN accelerators to…

Machine Learning · Computer Science 2022-07-26 Ourania Spantidi , Georgios Zervakis , Iraklis Anagnostopoulos , Jörg Henkel

Computational complexity reduction of deep neural networks

Deep neural networks (DNN) have been widely used and play a major role in the field of computer vision and autonomous navigation. However, these DNNs are computationally complex and their deployment over resource-constrained platforms is…

Machine Learning · Computer Science 2022-08-01 Mee Seong Im , Venkat R. Dasari

KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow

Dataflow scheduling decisions are of vital importance to neural network (NN) accelerators. Recent scalable NN accelerators support a rich set of advanced dataflow techniques. The problems of comprehensively representing and quickly finding…

Hardware Architecture · Computer Science 2023-06-29 Zhiyao Li , Mingyu Gao

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads. Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-01 Tian Zhao , Yaqi Zhang , Kunle Olukotun

Real-time Local Feature with Global Visual Information Enhancement

Local feature provides compact and invariant image representation for various visual tasks. Current deep learning-based local feature algorithms always utilize convolution neural network (CNN) architecture with limited receptive field.…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Jinyu Miao , Haosong Yue , Zhong Liu , Xingming Wu , Zaojun Fang , Guilin Yang

Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads

Large Language Models (LLMs) impose massive computational demands, driving the need for scalable multi-chiplet accelerators. However, existing mapping space exploration efforts for such accelerators primarily focus on traditional…

Hardware Architecture · Computer Science 2026-04-02 Boyu Li , Zongwei Zhu , Yi Xiong , Qianyue Cao , Jiawei Geng , Xiaonan Zhang , Xi Li

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either…

Neural and Evolutionary Computing · Computer Science 2019-04-15 Mohsen Imani , Mohammad Samragh , Yeseong Kim , Saransh Gupta , Farinaz Koushanfar , Tajana Rosing

Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators

Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-08 Raveesh Garg , Eric Qin , Francisco Muñoz-Martínez , Robert Guirado , Akshay Jain , Sergi Abadal , José L. Abellán , Manuel E. Acacio , Eduard Alarcón , Sivasankaran Rajamanickam , Tushar Krishna

Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training

In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor…

Hardware Architecture · Computer Science 2024-04-24 Muhammad Adnan , Amar Phanishayee , Janardhan Kulkarni , Prashant J. Nair , Divya Mahajan

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-25 Guan Shen , Jieru Zhao , Zeke Wang , Zhe Lin , Wenchao Ding , Chentao Wu , Quan Chen , Minyi Guo