Related papers: Extending Sparse Tensor Accelerators to Support Mu…

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design

The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often…

Hardware Architecture · Computer Science 2026-03-13 Junyi Wu , Chao Fang , Zhongfeng Wang

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity

Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they…

Hardware Architecture · Computer Science 2022-01-25 Eric Qin , Raveesh Garg , Abhimanyu Bambhaniya , Michael Pellauer , Angshuman Parashar , Sivasankaran Rajamanickam , Cong Hao , Tushar Krishna

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse…

Hardware Architecture · Computer Science 2023-05-12 Bahar Asgari , Ramyad Hadidi , Joshua Dierberger , Charlotte Steinichen , Amaan Marfatia , Hyesoon Kim

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction

Tensors play a vital role in machine learning (ML) and often exhibit properties best explored while maintaining high-order. Efficiently performing ML computations requires taking advantage of sparsity, but generalized hardware support is…

Hardware Architecture · Computer Science 2024-04-26 Gabriel Kulp , Andrew Ensinger , Lizhong Chen

Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling

High-energy, large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates, reaching up to $1$ terabyte and several petabytes per second, respectively. The development of real-time, high-throughput…

Artificial Intelligence · Computer Science 2024-12-03 Xihaier Luo , Samuel Lurvey , Yi Huang , Yihui Ren , Jin Huang , Byung-Jun Yoon

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating…

Machine Learning · Computer Science 2023-02-22 Zihao Ye , Ruihang Lai , Junru Shao , Tianqi Chen , Luis Ceze

Salca: A Sparsity-Aware Hardware Accelerator for Efficient Long-Context Attention Decoding

Long contexts improve capabilities of large language models but pose serious hardware challenges: compute and memory footprints grow linearly with sequence length. Particularly, the decoding phase continuously accesses massive KV cache,…

Hardware Architecture · Computer Science 2026-04-29 Wang Fan , Wei Cao , Xi Zha , Kedi Ma , MingQian Sun , Jialin Chen , Fengzhe Zhang , Fan Zhang

Compressing Structured Tensor Algebra

Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra…

Programming Languages · Computer Science 2024-07-19 Mahdi Ghorbani , Emilien Bauer , Tobias Grosser , Amir Shaikhha

Cardinality Sparsity: Applications in Matrix-Matrix Multiplications and Machine Learning

High-dimensional data has become ubiquitous across the sciences but presents computational and statistical challenges. A common approach to addressing these challenges is through sparsity. In this paper, we introduce a new concept of…

Statistics Theory · Mathematics 2025-09-03 Ali Mohades , Johannes Lederer

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training…

Hardware Architecture · Computer Science 2022-03-28 Mostafa Mahmoud , Isak Edo , Ali Hadi Zadeh , Omar Mohamed Awad , Gennady Pekhimenko , Jorge Albericio , Andreas Moshovos

Pre-Defined Sparse Neural Networks with Hardware Acceleration

Neural networks have proven to be extremely powerful tools for modern artificial intelligence applications, but computational and storage complexity remain limiting factors. This paper presents two compatible contributions towards reducing…

Machine Learning · Computer Science 2024-10-30 Sourya Dey , Kuan-Wen Huang , Peter A. Beerel , Keith M. Chugg

Optimizing Tensor Programs on Flexible Storage

Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the…

Databases · Computer Science 2022-10-13 Maximilian Schleich , Amir Shaikhha , Dan Suciu

Applying Data Compression Techniques on Systolic Neural Network Accelerator

New directions in computing and algorithms has lead to some new applications that have tolerance to imprecision. Although, These applications are creating large volumes of data which exceeds the capability of today's computing systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-16 Navid Mirnouri

Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Deep learning accelerators efficiently train over vast and growing amounts of data, placing a newfound burden on commodity networks and storage devices. A common approach to conserve bandwidth involves resizing or compressing data prior to…

Machine Learning · Computer Science 2021-08-13 Michael Kuchnik , George Amvrosiadis , Virginia Smith

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

In-Storage Embedded Accelerator for Sparse Pattern Processing

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing,…

Hardware Architecture · Computer Science 2017-01-25 Sang-Woo Jun , Huy T. Nguyen , Vijay N. Gadepally , Arvind

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Machine Learning · Computer Science 2021-04-20 Asit Mishra , Jorge Albericio Latorre , Jeff Pool , Darko Stosic , Dusan Stosic , Ganesh Venkatesh , Chong Yu , Paulius Micikevicius

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu