Related papers: CoSA: Scheduling by Constrained Optimization for S…

SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators

Modern Deep Neural Network (DNN) accelerators are equipped with increasingly larger on-chip buffers to provide more opportunities to alleviate the increasingly severe DRAM bandwidth pressure. However, most existing research on buffer…

Hardware Architecture · Computer Science 2025-01-23 Jingwei Cai , Xuan Wang , Mingyu Gao , Sen Peng , Zijian Zhu , Yuchen Wei , Zuotong Wu , Kaisheng Ma

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators

In the hardware design space exploration process, it is critical to optimize both hardware parameters and algorithm-to-hardware mappings. Previous work has largely approached this simultaneous optimization problem by separately exploring…

Hardware Architecture · Computer Science 2025-09-16 Charles Hong , Qijing Huang , Grace Dinh , Mahesh Subedar , Yakun Sophia Shao

Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators

Specialized hardware accelerators have been extensively used for Deep Neural Networks (DNNs) to provide power/performance benefits. These accelerators contain specialized hardware that supports DNN operators, and scratchpad memory for…

Machine Learning · Computer Science 2023-12-01 Yi Li , Aarti Gupta , Sharad Malik

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle…

Hardware Architecture · Computer Science 2024-06-17 Victor J. B. Jung , Arne Symons , Linyan Mei , Marian Verhelst , Luca Benini

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-23 Ye Yu , Yingmin Li , Shuai Che , Niraj K. Jha , Weifeng Zhang

Towards Budget-Driven Hardware Optimization for Deep Convolutional Neural Networks using Stochastic Computing

Recently, Deep Convolutional Neural Network (DCNN) has achieved tremendous success in many machine learning applications. Nevertheless, the deep structure has brought significant increases in computation complexity. Largescale deep learning…

Neural and Evolutionary Computing · Computer Science 2018-05-14 Zhe Li , Ji Li , Ao Ren , Caiwen Ding , Jeffrey Draper , Qinru Qiu , Bo Yuan , Yanzhi Wang

NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications. To tackle this limitation,…

Hardware Architecture · Computer Science 2022-12-20 Huihong Shi , Haoran You , Yang Zhao , Zhongfeng Wang , Yingyan Lin

DNA: Differentiable Network-Accelerator Co-Search

Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNN-powered intelligence into numerous applications. Jointly optimizing the networks and their accelerators are promising in…

Machine Learning · Computer Science 2025-01-07 Yongan Zhang , Yonggan Fu , Weiwen Jiang , Chaojian Li , Haoran You , Meng Li , Vikas Chandra , Yingyan Celine Lin

DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration

As deep neural networks develop significantly more diverse and complex, achieving high performance and efficiency on complicated DNN models faces pressing challenges. Modern DNN workloads are increasingly diverse in operation types, tensor…

Hardware Architecture · Computer Science 2026-05-25 Xingzhen Chen , Zhuoping Yang , Jinming Zhuang , Shixin Ji , Sarah Schultz , Zheng Dong , Weisong Shi , Peipei Zhou

A Programmable Approach to Neural Network Compression

Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such…

Machine Learning · Computer Science 2020-12-03 Vinu Joseph , Saurav Muralidharan , Animesh Garg , Michael Garland , Ganesh Gopalakrishnan

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on the same hardware, has been proposed to satisfy the latency…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-11 Seah Kim , Hasan Genc , Vadim Vadimovich Nikiforov , Krste Asanović , Borivoje Nikolić , Yakun Sophia Shao

Learned Hardware/Software Co-Design of Neural Accelerators

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning. Because the design space of deep learning software stacks and hardware accelerators is diverse…

Machine Learning · Computer Science 2020-10-06 Zhan Shi , Chirag Sakhuja , Milad Hashemi , Kevin Swersky , Calvin Lin

MetaNet: Automated Dynamic Selection of Scheduling Policies in Cloud Environments

Task scheduling is a well-studied problem in the context of optimizing the Quality of Service (QoS) of cloud computing environments. In order to sustain the rapid growth of computational demands, one of the most important QoS metrics for…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-24 Shreshth Tuli , Giuliano Casale , Nicholas R. Jennings

A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN Resource Scheduling

In recent years, to sustain the resource-intensive computational needs for training deep neural networks (DNNs), it is widely accepted that exploiting the parallelism in large-scale computing clusters is critical for the efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-31 Menglu Yu , Chuan Wu , Bo Ji , Jia Liu

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

The research interest in specialized hardware accelerators for deep neural networks (DNN) spikes recently owing to their superior performance and efficiency. However, today's DNN accelerators primarily focus on accelerating specific…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-11 Cong Guo , Yangjie Zhou , Jingwen Leng , Yuhao Zhu , Zidong Du , Quan Chen , Chao Li , Bin Yao , Minyi Guo

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators. To improve the overall solution quality as well as to boost the design productivity, efficient…

Hardware Architecture · Computer Science 2020-10-16 Cong Hao , Yao Chen , Xiaofan Zhang , Yuhong Li , Jinjun Xiong , Wen-mei Hwu , Deming Chen

ACCO: Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators

Spatio-Temporal Convolutional Neural Networks (ST-CNN) allow extending CNN capabilities from image processing to consecutive temporal-pattern recognition. Generally, state-of-the-art (SotA) ST-CNNs inflate the feature maps and weights from…

Signal Processing · Electrical Eng. & Systems 2024-06-12 Jun Yin , Linyan Mei , Andre Guntoro , Marian Verhelst

Scheduling Techniques of AI Models on Modern Heterogeneous Edge GPU -- A Critical Review

In recent years, the development of specialized edge computing devices has significantly increased, driven by the growing demand for AI models. These devices, such as the NVIDIA Jetson series, must efficiently handle increased data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-03 Ashiyana Abdul Majeed , Mahmoud Meribout

Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training

In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our approach addresses both single-device and distributed pipeline and tensor…

Hardware Architecture · Computer Science 2024-04-24 Muhammad Adnan , Amar Phanishayee , Janardhan Kulkarni , Prashant J. Nair , Divya Mahajan

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network…

Machine Learning · Computer Science 2020-12-29 Miguel de Prado , Andrew Mundy , Rabia Saeed , Maurizio Denna , Nuria Pazos , Luca Benini