Related papers: In-Storage Embedded Accelerator for Sparse Pattern…

Fast Matlab compatible sparse assembly on multicore computers

We develop and implement in this paper a fast sparse assembly algorithm, the fundamental operation which creates a compressed matrix from raw index data. Since it is often a quite demanding and sometimes critical operation, it is of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-28 Stefan Engblom , Dimitar Lukarski

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Performance-Optimum Superscalar Architecture for Embedded Applications

Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore,…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Mostafa E. Salehi

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-22 Eric Qin , Geonhwa Jeong , William Won , Sheng-Chun Kao , Hyoukjun Kwon , Sudarshan Srinivasan , Dipankar Das , Gordon E. Moon , Sivasankaran Rajamanickam , Tushar Krishna

Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation

Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of underlying details of the chosen sparse matrix storage…

Mathematical Software · Computer Science 2019-07-23 Conrad Sanderson , Ryan Curtin

Loading Large Sparse Matrices Stored in Files in the Adaptive-Blocking Hierarchical Storage Format

The parallel algorithm for loading large sparse matrices from files into distributed memories of high performance computing (HPC) systems is presented. This algorithm was designed specially for matrices stored in files in the space-effcient…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-30 Daniel Langr , Ivan Šimeček , Pavel Tvrdík

A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing

This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today's computing…

Hardware Architecture · Computer Science 2020-09-17 Hongyang Jia , Yinqi Tang , Hossein Valavi , Jintao Zhang , Naveen Verma

Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications

In the future, embedded processors must process more computation-intensive network applications and internet traffic and packet-processing tasks become heavier and sophisticated. Since the processor performance is severely related to the…

Hardware Architecture · Computer Science 2012-05-10 Mehdi Alipour , Mostafa E. Salehi , Hesamodin shojaei baghini

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

S4: a High-sparsity, High-performance AI Accelerator

Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become…

Hardware Architecture · Computer Science 2022-07-19 Ian En-Hsu Yen , Zhibin Xiao , Dongkuan Xu

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

UniSparse: An Intermediate Language for General Sparse Format Customization

The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by…

Computation and Language · Computer Science 2024-03-12 Jie Liu , Zhongyuan Zhao , Zijian Ding , Benjamin Brock , Hongbo Rong , Zhiru Zhang

Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization

State-of-the-art Transformer-based models, with gigantic parameters, are difficult to be accommodated on resource constrained embedded devices. Moreover, with the development of technology, more and more embedded devices are available to…

Machine Learning · Computer Science 2021-10-20 Panjie Qi , Edwin Hsing-Mean Sha , Qingfeng Zhuge , Hongwu Peng , Shaoyi Huang , Zhenglun Kong , Yuhong Song , Bingbing Li

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Jinliang Shi , Shigang Li , Youxuan Xu , Rongtian Fu , Xueying Wang , Tong Wu

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra

Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we…

Hardware Architecture · Computer Science 2020-12-15 Paul Scheffler , Florian Zaruba , Fabian Schuiki , Torsten Hoefler , Luca Benini

A User-Friendly Hybrid Sparse Matrix Class in C++

When implementing functionality which requires sparse matrices, there are numerous storage formats to choose from, each with advantages and disadvantages. To achieve good performance, several formats may need to be used in one program,…

Mathematical Software · Computer Science 2019-10-22 Conrad Sanderson , Ryan Curtin

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden,…

Hardware Architecture · Computer Science 2022-11-01 Chao Fang , Aojun Zhou , Zhongfeng Wang

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Machine Learning · Computer Science 2021-04-20 Asit Mishra , Jorge Albericio Latorre , Jeff Pool , Darko Stosic , Dusan Stosic , Ganesh Venkatesh , Chong Yu , Paulius Micikevicius

Learning Efficient Structured Sparse Models

We present a comprehensive framework for structured sparse coding and modeling extending the recent ideas of using learnable fast regressors to approximate exact sparse codes. For this purpose, we develop a novel block-coordinate proximal…

Machine Learning · Computer Science 2012-06-22 Alex Bronstein , Pablo Sprechmann , Guillermo Sapiro