Related papers: Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Mu…

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and…

Machine Learning · Computer Science 2024-08-30 Sanjali Yadav , Bahar Asgari

FlexiSAGA: A Flexible Systolic Array GEMM Accelerator for Sparse and Dense Processing

Artificial Intelligence (AI) algorithms, such as Deep Neural Networks (DNNs), have become an important tool for a wide range of applications, from computer vision to natural language processing. However, the computational complexity of DNN…

Performance · Computer Science 2025-06-03 Mika Markus Müller , Konstantin Lübeck , Alexander Louis-Ferdinand Jung , Jannik Steinmetz , Oliver Bringmann

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Jinliang Shi , Shigang Li , Youxuan Xu , Rongtian Fu , Xueying Wang , Tong Wu

SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computation and memory requirements pose a challenge to the hardware design. In this work, we leverage the intrinsic…

Machine Learning · Computer Science 2017-11-07 Jingyang Zhu , Jingbo Jiang , Xizi Chen , Chi-Ying Tsui

FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs

Graph Convolutional Networks (GCNs) are widely adopted for tasks involving relational or graph-structured data and can be formulated as two-stage sparse-dense matrix multiplication (SpMM) during inference. However, existing accelerators…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Bohan Li , Shengmin Li , Xinyu Shi , Enyi Yao , Francky Catthoor , Simei Yang

FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices

This paper introduces FlexNN, a Flexible Neural Network accelerator, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures…

Hardware Architecture · Computer Science 2025-06-27 Arnab Raha , Deepak A. Mathaikutty , Soumendu K. Ghosh , Shamik Kundu

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

SpArch: Efficient Architecture for Sparse Matrix Multiplication

Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various engineering and scientific applications. However, inner product based SpGENN introduces redundant input fetches for mismatched nonzero operands, while…

Hardware Architecture · Computer Science 2024-04-05 Zhekai Zhang , Hanrui Wang , Song Han , William J. Dally

NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU

Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. In response to this issue, weight…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-05 Cong Ma , Du Wu , Zhelang Deng , Jiang Chen , Xiaowen Huang , Jintao Meng , Wenxi Zhu , Bingqiang Wang , Amelie Chi Zhou , Peng Chen , Minwen Deng , Yanjie Wei , Shengzhong Feng , Yi Pan

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

Deep Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in a wide range of applications. However, deeper CNN models, which are usually computation consuming, are widely required for complex Artificial…

Systems and Control · Electrical Eng. & Systems 2020-01-08 Chaoyang Zhu , Kejie Huang , Shuyuan Yang , Ziqi Zhu , Hejia Zhang , Haibin Shen

RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity,…

Neural and Evolutionary Computing · Computer Science 2023-06-13 Adithya Krishna , Srikanth Rohit Nudurupati , Chandana D G , Pritesh Dwivedi , André van Schaik , Mahesh Mehendale , Chetan Singh Thakur

FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs

General sparse matrix-matrix multiplication (SpGEMM) is an integral part of many scientific computing, high-performance computing (HPC), and graph analytic applications. This paper presents a new compressed sparse vector (CSV) format for…

Performance · Computer Science 2021-12-21 Erfan Bank Tavakoli , Michael Riera , Masudul Hassan Quraishi , Fengbo Ren

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Sparsity-Aware Streaming SNN Accelerator with Output-Channel Dataflow for Automatic Modulation Classification

The rapid advancement of wireless communication technologies, including 5G, emerging 6G networks, and the large-scale deployment of the Internet of Things (IoT), has intensified the need for efficient spectrum utilization. Automatic…

Hardware Architecture · Computer Science 2026-01-07 Kuilian Yang , Li Zhang , Ahmed M. Eltawil , Khaled Nabil Salama

LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks

Spiking Neural Networks (SNNs) have gained significant research attention in the last decade due to their potential to drive resource-constrained edge devices. Though existing SNN accelerators offer high efficiency in processing sparse…

Hardware Architecture · Computer Science 2024-09-04 Ruokai Yin , Youngeun Kim , Di Wu , Priyadarshini Panda

ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs

Fueled by the ability to mine real-world graph data, GNN applications have experienced phenomenal growth. Sparse Matrix-Matrix Multiplication (SpMM) is a critical operator in GNNs. However, existing SpMM designs for GNNs struggle to adapt…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-18 Lixing Zhang , Guanhua Ye , Hongzheng Li , Shigang Li , Yingxia Shao

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. Specifically, the SPS dataflow enables a novel hardware design approach…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Jung Hwan Heo , Arash Fayyazi , Amirhossein Esmaili , Massoud Pedram

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-11 Xiaonan Nie , Xupeng Miao , Zilong Wang , Zichao Yang , Jilong Xue , Lingxiao Ma , Gang Cao , Bin Cui

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open…

Artificial Intelligence · Computer Science 2023-10-31 Haitao Xu , Songwei Liu , Yuyang Xu , Shuai Wang , Jiashi Li , Chenqian Yan , Liangqiang Li , Lean Fu , Xin Pan , Fangmin Chen

Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-17 Haisha Zhao , San Li , Jiaheng Wang , Chunbao Zhou , Jue Wang , Zhikuang Xin , Shunde Li , Zhiqiang Liang , Zhijie Pan , Fang Liu , Yan Zeng , Yangang Wang , Xuebin Chi