Related papers: StruM: Structured Mixed Precision for Efficient De…

Mixed-precision deep learning based on computational memory

Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally…

Emerging Technologies · Computer Science 2020-05-13 S. R. Nandakumar , Manuel Le Gallo , Christophe Piveteau , Vinay Joshi , Giovanni Mariani , Irem Boybat , Geethan Karunaratne , Riduan Khaddam-Aljameh , Urs Egger , Anastasios Petropoulos , Theodore Antonakopoulos , Bipin Rajendran , Abu Sebastian , Evangelos Eleftheriou

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement…

Neural and Evolutionary Computing · Computer Science 2018-04-23 Shihui Yin , Gaurav Srivastava , Shreyas K. Venkataramanaiah , Chaitali Chakrabarti , Visar Berisha , Jae-sun Seo

M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs

Mixed-precision quantization is a popular approach for compressing deep neural networks (DNNs). However, it is challenging to scale the performance efficiently with mixed-precision DNNs given the current FPGA architecture and conventional…

Hardware Architecture · Computer Science 2023-11-07 Yuzong Chen , Jordan Dotzel , Mohamed S. Abdelfattah

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In…

Neural and Evolutionary Computing · Computer Science 2024-10-31 Anagha Nimbekar , Prabodh Katti , Chen Li , Bashir M. Al-Hashimi , Amit Acharyya , Bipin Rajendran

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

Accelerating DNN Training with Structured Data Gradient Pruning

Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not…

Machine Learning · Computer Science 2022-02-03 Bradley McDanel , Helia Dinh , John Magallanes

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

Weight pruning methods of DNNs have been demonstrated to achieve a good model pruning rate without loss of accuracy, thereby alleviating the significant computation/storage requirements of large-scale DNNs. Structured weight pruning methods…

Neural and Evolutionary Computing · Computer Science 2019-03-28 Tianyun Zhang , Shaokai Ye , Kaiqi Zhang , Xiaolong Ma , Ning Liu , Linfeng Zhang , Jian Tang , Kaisheng Ma , Xue Lin , Makan Fardad , Yanzhi Wang

STRIDe: Cross-Coupled STT-MRAM Enabling Robust In-Memory-Computing for Deep Neural Network Accelerators

As deep neural network (DNN) models are growing exponentially in size, their deployment on resource-constrained edge platforms is becoming increasingly challenging. In-memory-computing (IMC) with non-volatile memories (NVMs) has emerged as…

Emerging Technologies · Computer Science 2026-04-07 Imtiaz Ahmed , Sumeet Kumar Gupta

A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination

Deploying mixed-precision neural networks on edge devices is friendly to hardware resources and power consumption. To support fully mixed-precision neural network inference, it is necessary to design flexible hardware accelerators for…

Hardware Architecture · Computer Science 2025-02-04 Liang Zhao , Kunming Shao , Fengshi Tian , Tim Kwang-Ting Cheng , Chi-Ying Tsui , Yi Zou

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural…

Machine Learning · Computer Science 2018-02-20 Yanzhi Wang , Caiwen Ding , Zhe Li , Geng Yuan , Siyu Liao , Xiaolong Ma , Bo Yuan , Xuehai Qian , Jian Tang , Qinru Qiu , Xue Lin

Speeding up and reducing memory usage for scientific machine learning via mixed precision

Scientific machine learning (SciML) has emerged as a versatile approach to address complex computational science and engineering problems. Within this field, physics-informed neural networks (PINNs) and deep operator networks (DeepONets)…

Machine Learning · Computer Science 2024-01-31 Joel Hayford , Jacob Goldman-Wetzler , Eric Wang , Lu Lu

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference…

Computer Vision and Pattern Recognition · Computer Science 2020-10-30 Zhuliang Yao , Shijie Cao , Wencong Xiao , Chen Zhang , Lanshun Nie

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has…

Machine Learning · Computer Science 2023-09-25 Chao Fang , Wei Sun , Aojun Zhou , Zhongfeng Wang

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

With the widespread use of deep neural networks(DNNs) in intelligent systems, DNN accelerators with high performance and energy efficiency are greatly demanded. As one of the feasible processing-in-memory(PIM) architectures,…

Hardware Architecture · Computer Science 2023-12-22 Junpeng Wang , Mengke Ge , Bo Ding , Qi Xu , Song Chen , Yi Kang

Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning

As deep neural networks are growing in size and being increasingly deployed to more resource-limited devices, there has been a recent surge of interest in network pruning methods, which aim to remove less important weights or activations of…

Machine Learning · Computer Science 2020-06-23 Minyoung Song , Jaehong Yoon , Eunho Yang , Sung Ju Hwang

Structured Pruning of Deep Convolutional Neural Networks

Real time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular…

Neural and Evolutionary Computing · Computer Science 2015-12-31 Sajid Anwar , Kyuyeon Hwang , Wonyong Sung

P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats

The substantial memory bandwidth and computational demands of large language models (LLMs) present critical challenges for efficient inference. To tackle this, the literature has explored heterogeneous systems that combine neural processing…

Hardware Architecture · Computer Science 2026-05-05 Yuzong Chen , Chao Fang , Xilai Dai , Yuheng Wu , Thierry Tambe , Marian Verhelst , Mohamed S. Abdelfattah

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for…

Machine Learning · Computer Science 2023-12-27 Konstantinos Balaskas , Andreas Karatzas , Christos Sad , Kostas Siozios , Iraklis Anagnostopoulos , Georgios Zervakis , Jörg Henkel

SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding

Large language models (LLMs) have demonstrated exceptional proficiency in understanding and generating human language, but efficient inference on resource-constrained embedded devices remains challenging due to large model sizes and…

Hardware Architecture · Computer Science 2025-07-15 Weihong Xu , Haein Choi , Po-kai Hsu , Shimeng Yu , Tajana Rosing

High performance and energy efficient inference for deep learning on ARM processors

We evolve PyDTNN, a framework for distributed parallel training of Deep Neural Networks (DNNs), into an efficient inference tool for convolutional neural networks. Our optimization process on multicore ARM processors involves several…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-20 Adrián Castelló , Sergio Barrachina , Manuel F. Dolz , Enrique S. Quintana-Ortí , Pau San Juan