Related papers: Workload-Aware Hardware Accelerator Mining for Dis…

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-23 Ye Yu , Yingmin Li , Shuai Che , Niraj K. Jha , Weifeng Zhang

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware…

Machine Learning · Computer Science 2022-02-02 Dan Zhang , Safeen Huda , Ebrahim Songhori , Kartik Prabhu , Quoc Le , Anna Goldie , Azalia Mirhoseini

Integrated Hardware Architecture and Device Placement Search

Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization of determining the optimal…

Machine Learning · Computer Science 2024-07-19 Irene Wang , Jakub Tarnawski , Amar Phanishayee , Divya Mahajan

Rethinking Co-design of Neural Architectures and Hardware Accelerators

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed…

Machine Learning · Computer Science 2021-02-18 Yanqi Zhou , Xuanyi Dong , Berkin Akin , Mingxing Tan , Daiyi Peng , Tianjian Meng , Amir Yazdanbakhsh , Da Huang , Ravi Narayanaswami , James Laudon

Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment

Customized hardware accelerators have been developed to provide improved performance and efficiency for DNN inference and training. However, the existing hardware accelerators may not always be suitable for handling various DNN models as…

Hardware Architecture · Computer Science 2021-04-07 Xiaofan Zhang , Hanchen Ye , Deming Chen

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used in many domains. To achieve high performance and energy efficiency, hardware acceleration (especially inference) of DNNs is…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-17 Linghao Song , Jiachen Mao , Youwei Zhuo , Xuehai Qian , Hai Li , Yiran Chen

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while…

Hardware Architecture · Computer Science 2025-11-26 Jinsong Zhang , Minghe Li , Jiayi Tian , Jinming Lu , Zheng Zhang

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…

Hardware Architecture · Computer Science 2025-05-23 Serena Curzel , Fabrizio Ferrandi , Leandro Fiorin , Daniele Ielmini , Cristina Silvano , Francesco Conti , Luca Bompani , Luca Benini , Enrico Calore , Sebastiano Fabio Schifano , Cristian Zambelli , Maurizio Palesi , Giuseppe Ascia , Enrico Russo , Valeria Cardellini , Salvatore Filippone , Francesco Lo Presti , Stefania Perri

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network…

Machine Learning · Computer Science 2020-12-29 Miguel de Prado , Andrew Mundy , Rabia Saeed , Maurizio Denna , Nuria Pazos , Luca Benini

Benchmarking Resource Usage for Efficient Distributed Deep Learning

Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains. Neural architecture searches, hyperparameter sweeps, and rapid prototyping consume immense resources that can prevent…

Machine Learning · Computer Science 2022-02-01 Nathan C. Frey , Baolin Li , Joseph McDonald , Dan Zhao , Michael Jones , David Bestor , Devesh Tiwari , Vijay Gadepally , Siddharth Samsi

Hardware-Aware Machine Learning: Modeling and Optimization

Recent breakthroughs in Deep Learning (DL) applications have made DL models a key component in almost every modern computing system. The increased popularity of DL applications deployed on a wide-spectrum of platforms have resulted in a…

Machine Learning · Computer Science 2018-09-17 Diana Marculescu , Dimitrios Stamoulis , Ermao Cai

Learned Hardware/Software Co-Design of Neural Accelerators

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning. Because the design space of deep learning software stacks and hardware accelerators is diverse…

Machine Learning · Computer Science 2020-10-06 Zhan Shi , Chirag Sakhuja , Milad Hashemi , Kevin Swersky , Calvin Lin

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

Hardware-Aware Neural Architecture Search for Encrypted Traffic Classification on Resource-Constrained Devices

This paper presents a hardware-efficient deep neural network (DNN), optimized through hardware-aware neural architecture search (HW-NAS); the DNN supports the classification of session-level encrypted traffic on resource-constrained…

Networking and Internet Architecture · Computer Science 2026-03-20 Adel Chehade , Edoardo Ragusa , Paolo Gastaldo , Rodolfo Zunino

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto…

Hardware Architecture · Computer Science 2024-06-06 Zhewen Yu , Sudarshan Sreeram , Krish Agrawal , Junyi Wu , Alexander Montgomerie-Corcoran , Cheng Zhang , Jianyi Cheng , Christos-Savvas Bouganis , Yiren Zhao

A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators

In view of the performance limitations of fully-decoupled designs for neural architectures and accelerators, hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network…

Hardware Architecture · Computer Science 2022-03-29 Bingqian Lu , Zheyu Yan , Yiyu Shi , Shaolei Ren

AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural…

Machine Learning · Computer Science 2025-01-20 Jamin Seo , Akshat Ramachandran , Yu-Chuan Chuang , Anirudh Itagi , Tushar Krishna

Hardware Acceleration for Neural Networks: A Comprehensive Survey

Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement,…

Systems and Control · Electrical Eng. & Systems 2026-01-16 Bin Xu , Ayan Banerjee , Sandeep Gupta

An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

End-to-end performance estimation and measurement of deep neural network (DNN) systems become more important with increasing complexity of DNN systems consisting of hardware and software components. The methodology proposed in this paper…

Machine Learning · Computer Science 2019-11-19 Michael J. Klaiber , Sebastian Vogel , Axel Acosta , Robert Korn , Leonardo Ecco , Kristine Back , Andre Guntoro , Ingo Feldner

Heterogeneous Multi-core Array-based DNN Accelerator

In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool…

Hardware Architecture · Computer Science 2022-06-28 Mohammad Ali Maleki , Mehdi Kamal , Ali Afzali-Kusha