English
Related papers

Related papers: Workload-Aware Hardware Accelerator Mining for Dis…

200 papers

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-23 Ye Yu , Yingmin Li , Shuai Che , Niraj K. Jha , Weifeng Zhang

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware…

Machine Learning · Computer Science 2022-02-02 Dan Zhang , Safeen Huda , Ebrahim Songhori , Kartik Prabhu , Quoc Le , Anna Goldie , Azalia Mirhoseini

Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization of determining the optimal…

Machine Learning · Computer Science 2024-07-19 Irene Wang , Jakub Tarnawski , Amar Phanishayee , Divya Mahajan

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed…

Customized hardware accelerators have been developed to provide improved performance and efficiency for DNN inference and training. However, the existing hardware accelerators may not always be suitable for handling various DNN models as…

Hardware Architecture · Computer Science 2021-04-07 Xiaofan Zhang , Hanchen Ye , Deming Chen

With the rise of artificial intelligence in recent years, Deep Neural Networks (DNNs) have been widely used in many domains. To achieve high performance and energy efficiency, hardware acceleration (especially inference) of DNNs is…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-17 Linghao Song , Jiachen Mao , Youwei Zhuo , Xuehai Qian , Hai Li , Yiran Chen

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while…

Hardware Architecture · Computer Science 2025-11-26 Jinsong Zhang , Minghe Li , Jiayi Tian , Jinming Lu , Zheng Zhang

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…

The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network…

Machine Learning · Computer Science 2020-12-29 Miguel de Prado , Andrew Mundy , Rabia Saeed , Maurizio Denna , Nuria Pazos , Luca Benini

Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains. Neural architecture searches, hyperparameter sweeps, and rapid prototyping consume immense resources that can prevent…

Recent breakthroughs in Deep Learning (DL) applications have made DL models a key component in almost every modern computing system. The increased popularity of DL applications deployed on a wide-spectrum of platforms have resulted in a…

Machine Learning · Computer Science 2018-09-17 Diana Marculescu , Dimitrios Stamoulis , Ermao Cai

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning. Because the design space of deep learning software stacks and hardware accelerators is diverse…

Machine Learning · Computer Science 2020-10-06 Zhan Shi , Chirag Sakhuja , Milad Hashemi , Kevin Swersky , Calvin Lin

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

This paper presents a hardware-efficient deep neural network (DNN), optimized through hardware-aware neural architecture search (HW-NAS); the DNN supports the classification of session-level encrypted traffic on resource-constrained…

Networking and Internet Architecture · Computer Science 2026-03-20 Adel Chehade , Edoardo Ragusa , Paolo Gastaldo , Rodolfo Zunino

Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto…

In view of the performance limitations of fully-decoupled designs for neural architectures and accelerators, hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network…

Hardware Architecture · Computer Science 2022-03-29 Bingqian Lu , Zheyu Yan , Yiyu Shi , Shaolei Ren

Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural…

Machine Learning · Computer Science 2025-01-20 Jamin Seo , Akshat Ramachandran , Yu-Chuan Chuang , Anirudh Itagi , Tushar Krishna

Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement,…

Systems and Control · Electrical Eng. & Systems 2026-01-16 Bin Xu , Ayan Banerjee , Sandeep Gupta

End-to-end performance estimation and measurement of deep neural network (DNN) systems become more important with increasing complexity of DNN systems consisting of hardware and software components. The methodology proposed in this paper…

Machine Learning · Computer Science 2019-11-19 Michael J. Klaiber , Sebastian Vogel , Axel Acosta , Robert Korn , Leonardo Ecco , Kristine Back , Andre Guntoro , Ingo Feldner

In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool…

Hardware Architecture · Computer Science 2022-06-28 Mohammad Ali Maleki , Mehdi Kamal , Ali Afzali-Kusha
‹ Prev 1 2 3 10 Next ›