Related papers: Efficient Sparse Processing-in-Memory Architecture…

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…

Hardware Architecture · Computer Science 2022-05-24 Christina Giannoula , Ivan Fernandez , Juan Gómez-Luna , Nectarios Koziris , Georgios Goumas , Onur Mutlu

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…

Hardware Architecture · Computer Science 2022-04-05 Christina Giannoula , Ivan Fernandez , Juan Gómez-Luna , Nectarios Koziris , Georgios Goumas , Onur Mutlu

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To…

Machine Learning · Computer Science 2023-06-30 Haihao Shen , Hengyu Meng , Bo Dong , Zhe Wang , Ofir Zafrir , Yi Ding , Yu Luo , Hanwen Chang , Qun Gao , Ziheng Wang , Guy Boudoukh , Moshe Wasserblat

A Low-Power Sparse Deep Learning Accelerator with Optimized Data Reuse

Sparse deep learning has reduced computation significantly, but its irregular non-zero data distribution complicates the data flow and hinders data reuse, increasing on-chip SRAM access and thus power consumption of the chip. This paper…

Hardware Architecture · Computer Science 2025-03-26 Kai-Chieh Hsu , Tian-Sheuan Chang

Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication

Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in scientific computing, graph analytics, and machine learning, whose performance is often constrained by memory bandwidth. In this work, we investigate the applicability…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-09 Matthew Qian , Yahia Ramadan , Suhita Anubha , Ariful Azad

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Da Zheng , Disa Mhembere , Vince Lyzinski , Joshua Vogelstein , Carey E. Priebe , Randal Burns

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs

Structured sparsity enables deploying large language models (LLMs) on resource-constrained systems. Approaches like dense-to-sparse fine-tuning are particularly compelling, achieving remarkable structured sparsity by reducing the model size…

Hardware Architecture · Computer Science 2025-10-14 João Paulo Cardoso de Lima , Marc Dietrich , Jeronimo Castrillon , Asif Ali Khan

Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and…

Machine Learning · Computer Science 2024-08-30 Sanjali Yadav , Bahar Asgari

NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU

Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. In response to this issue, weight…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-05 Cong Ma , Du Wu , Zhelang Deng , Jiang Chen , Xiaowen Huang , Jintao Meng , Wenxi Zhu , Bingqiang Wang , Amelie Chi Zhou , Peng Chen , Minwen Deng , Yanjie Wei , Shengzhong Feng , Yi Pan

SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs

The importance of general matrix multiplication (GEMM) is motivating new instruction set extensions for multiplying dense matrices in almost all contemporary ISAs, and these extensions are often implemented using high-performance systolic…

Hardware Architecture · Computer Science 2025-02-18 Tuan Ta , Joshua Randall , Christopher Batten

Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity

Processing-in-memory (PIM) is a transformative architectural paradigm designed to overcome the Von Neumann bottleneck. Among PIM architectures, digital SRAM-PIM emerges as a promising solution, offering significant advantages by directly…

Hardware Architecture · Computer Science 2025-06-13 Cenlin Duan , Jianlei Yang , Yikun Wang , Yiou Wang , Yingjie Qi , Xiaolin He , Bonan Yan , Xueyan Wang , Xiaotao Jia , Weisheng Zhao

SPINN: Sparse, Physics-based, and partially Interpretable Neural Networks for PDEs

We introduce a class of Sparse, Physics-based, and partially Interpretable Neural Networks (SPINN) for solving ordinary and partial differential equations (PDEs). By reinterpreting a traditional meshless representation of solutions of PDEs…

Machine Learning · Computer Science 2021-08-13 Amuthan A. Ramabathiran , Prabhu Ramachandran

Accelerating Unstructured SpGEMM using Structured In-situ Computing

Sparse matrix-matrix multiplication (SpGEMM) is a critical kernel widely employed in machine learning and graph algorithms. However, real-world matrices' high sparsity makes SpGEMM memory-intensive. In-situ computing offers the potential to…

Hardware Architecture · Computer Science 2023-11-08 Huize Li , Tulika Mitra

SMASH: Sparse Matrix Atomic Scratchpad Hashing

Sparse matrices, more specifically SpGEMM kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-01 Kaustubh Shivdikar

Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. It is well known that SpGEMM is a memory-bound operation, and its peak performance is expected to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-27 Zhixiang Gu , Jose Moreira , David Edelsohn , Ariful Azad

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity

To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open…

Artificial Intelligence · Computer Science 2023-10-31 Haitao Xu , Songwei Liu , Yuyang Xu , Shuai Wang , Jiashi Li , Chenqian Yan , Liangqiang Li , Lean Fu , Xin Pan , Fangmin Chen

Design Principles for Sparse Matrix Multiplication on the GPU

We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-13 Carl Yang , Aydin Buluc , John D. Owens

Empowering Malware Detection Efficiency within Processing-in-Memory Architecture

The widespread integration of embedded systems across various industries has facilitated seamless connectivity among devices and bolstered computational capabilities. Despite their extensive applications, embedded systems encounter…

Cryptography and Security · Computer Science 2024-04-16 Sreenitha Kasarapu , Sathwika Bavikadi , Sai Manoj Pudukotai Dinakarrao

Intrinsically Sparse Long Short-Term Memory Networks

Long Short-Term Memory (LSTM) has achieved state-of-the-art performances on a wide range of tasks. Its outstanding performance is guaranteed by the long-term memory ability which matches the sequential data perfectly and the gating…

Neural and Evolutionary Computing · Computer Science 2019-01-29 Shiwei Liu , Decebal Constantin Mocanu , Mykola Pechenizkiy