Related papers: TeAAL: A Declarative Framework for Modeling Sparse…

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of…

Hardware Architecture · Computer Science 2023-01-11 Yannan Nellie Wu , Po-An Tsai , Angshuman Parashar , Vivienne Sze , Joel S. Emer

RTeAAL Sim: Using Tensor Algebra to Represent and Accelerate RTL Simulation (Extended Version)

RTL simulation on CPUs remains a persistent bottleneck in hardware design. State-of-the-art simulators embed the circuit directly into the simulation binary, resulting in long compilation times and execution that is fundamentally CPU…

Hardware Architecture · Computer Science 2026-01-27 Yan Zhu , Boru Chen , Christopher W. Fletcher , Nandeeka Nayak

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they…

Hardware Architecture · Computer Science 2022-01-25 Eric Qin , Raveesh Garg , Abhimanyu Bambhaniya , Michael Pellauer , Angshuman Parashar , Sivasankaran Rajamanickam , Cong Hao , Tushar Krishna

$\nabla$SD: Differentiable Programming for Sparse Tensors

Sparse tensors are prevalent in many data-intensive applications, yet existing differentiable programming frameworks are tailored towards dense tensors. This presents a significant challenge for efficiently computing gradients through…

Programming Languages · Computer Science 2023-03-14 Amir Shaikhha , Mathieu Huot , Shideh Hashemian

Training-Free Activation Sparsity in Large Language Models

Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations…

Computation and Language · Computer Science 2025-02-27 James Liu , Pragaash Ponnusamy , Tianle Cai , Han Guo , Yoon Kim , Ben Athiwaratkun

TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra

Tensor algebra finds applications in various domains, and these applications, especially when accelerated on spatial hardware accelerators, can deliver high performance and low power. Spatial hardware accelerator exhibits complex design…

Hardware Architecture · Computer Science 2021-04-27 Liancheng Jia , Zizhang Luo , Liqiang Lu , Yun Liang

SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy

The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios,…

Machine Learning · Computer Science 2025-08-19 Boran Zhao , Haiming Zhai , Zihang Yuan , Hetian Liu , Tian Xia , Wenzhe Zhao , Pengju Ren

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction

Tensors play a vital role in machine learning (ML) and often exhibit properties best explored while maintaining high-order. Efficiently performing ML computations requires taking advantage of sparsity, but generalized hardware support is…

Hardware Architecture · Computer Science 2024-04-26 Gabriel Kulp , Andrew Ensinger , Lizhong Chen

A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR

Tensor algebra is widely used in many applications, such as scientific computing, machine learning, and data analytics. The tensors represented real-world data are usually large and sparse. There are tens of storage formats designed for…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-11 Ruiqin Tian , Luanzheng Guo , Jiajia Li , Bin Ren , Gokcen Kestor

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-22 Eric Qin , Geonhwa Jeong , William Won , Sheng-Chun Kao , Hyoukjun Kwon , Sudarshan Srinivasan , Dipankar Das , Gordon E. Moon , Sivasankaran Rajamanickam , Tushar Krishna

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data…

Hardware Architecture · Computer Science 2024-06-27 Zi Yu Xue , Yannan Nellie Wu , Joel S. Emer , Vivienne Sze

TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge

Ternary quantization has emerged as a powerful technique for reducing both computational and memory footprint of large language models (LLM), enabling efficient real-time inference deployment without significantly compromising model…

Hardware Architecture · Computer Science 2025-09-18 Zhirui Huang , Rui Ma , Shijie Cao , Ran Shu , Ian Wang , Ting Cao , Chixiao Chen , Yongqiang Xiong

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-07 Jiahao Fang , Huizheng Wang , Qize Yang , Dehao Kong , Xu Dai , Jinyi Deng , Yang Hu , Shouyi Yin

XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection

Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are…

Machine Learning · Computer Science 2024-05-27 Yuanhang Yang , Shiyi Qi , Wenchao Gu , Chaozheng Wang , Cuiyun Gao , Zenglin Xu

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

Maple: A Processing Element for Row-Wise Product Based Sparse Tensor Accelerators

Sparse tensor computing is a core computational part of numerous applications in areas such as data science, graph processing, and scientific computing. Sparse tensors offer the potential of skipping unnecessary computations caused by zero…

Hardware Architecture · Computer Science 2023-03-28 Midia Reshadi , David Gregg

DISTAL: The Distributed Tensor Algebra Compiler

We introduce DISTAL, a compiler for dense tensor algebra that targets modern distributed and heterogeneous systems. DISTAL lets users independently describe how tensors and computation map onto target machines through separate format and…

Programming Languages · Computer Science 2022-03-18 Rohan Yadav , Alex Aiken , Fredrik Kjolstad

Sparse Tensor Algebra as a Parallel Programming Model

Dense and sparse tensors allow the representation of most bulk data structures in computational science applications. We show that sparse tensor algebra can also be used to express many of the transformations on these datasets, especially…

Mathematical Software · Computer Science 2015-12-02 Edgar Solomonik , Torsten Hoefler

Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-10 Genghan Zhang , Yuetong Zhao , Yanting Tao , Zhongming Yu , Guohao Dai , Sitao Huang , Yuan Wen , Pavlos Petoumenos , Yu Wang