English
Related papers

Related papers: In-Storage Embedded Accelerator for Sparse Pattern…

200 papers

We develop and implement in this paper a fast sparse assembly algorithm, the fundamental operation which creates a compressed matrix from raw index data. Since it is often a quite demanding and sometimes critical operation, it is of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-28 Stefan Engblom , Dimitar Lukarski

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore,…

Hardware Architecture · Computer Science 2012-04-13 Mehdi Alipour , Mostafa E. Salehi

Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-22 Eric Qin , Geonhwa Jeong , William Won , Sheng-Chun Kao , Hyoukjun Kwon , Sudarshan Srinivasan , Dipankar Das , Gordon E. Moon , Sivasankaran Rajamanickam , Tushar Krishna

Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of underlying details of the chosen sparse matrix storage…

Mathematical Software · Computer Science 2019-07-23 Conrad Sanderson , Ryan Curtin

The parallel algorithm for loading large sparse matrices from files into distributed memories of high performance computing (HPC) systems is presented. This algorithm was designed specially for matrices stored in files in the space-effcient…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-30 Daniel Langr , Ivan Šimeček , Pavel Tvrdík

This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today's computing…

Hardware Architecture · Computer Science 2020-09-17 Hongyang Jia , Yinqi Tang , Hossein Valavi , Jintao Zhang , Naveen Verma

In the future, embedded processors must process more computation-intensive network applications and internet traffic and packet-processing tasks become heavier and sophisticated. Since the processor performance is severely related to the…

Hardware Architecture · Computer Science 2012-05-10 Mehdi Alipour , Mostafa E. Salehi , Hesamodin shojaei baghini

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become…

Hardware Architecture · Computer Science 2022-07-19 Ian En-Hsu Yen , Zhibin Xiao , Dongkuan Xu

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by…

Computation and Language · Computer Science 2024-03-12 Jie Liu , Zhongyuan Zhao , Zijian Ding , Benjamin Brock , Hongbo Rong , Zhiru Zhang

State-of-the-art Transformer-based models, with gigantic parameters, are difficult to be accommodated on resource constrained embedded devices. Moreover, with the development of technology, more and more embedded devices are available to…

Machine Learning · Computer Science 2021-10-20 Panjie Qi , Edwin Hsing-Mean Sha , Qingfeng Zhuge , Hongwu Peng , Shaoyi Huang , Zhenglun Kong , Yuhong Song , Bingbing Li

Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Jinliang Shi , Shigang Li , Youxuan Xu , Rongtian Fu , Xueying Wang , Tong Wu

Sparse-dense linear algebra is crucial in many domains, but challenging to handle efficiently on CPUs, GPUs, and accelerators alike; multiplications with sparse formats like CSR and CSF require indirect memory lookups. In this work, we…

Hardware Architecture · Computer Science 2020-12-15 Paul Scheffler , Florian Zaruba , Fabian Schuiki , Torsten Hoefler , Luca Benini

When implementing functionality which requires sparse matrices, there are numerous storage formats to choose from, each with advantages and disadvantages. To achieve good performance, several formats may need to be used in one program,…

Mathematical Software · Computer Science 2019-10-22 Conrad Sanderson , Ryan Curtin

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden,…

Hardware Architecture · Computer Science 2022-11-01 Chao Fang , Aojun Zhou , Zhongfeng Wang

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

We present a comprehensive framework for structured sparse coding and modeling extending the recent ideas of using learnable fast regressors to approximate exact sparse codes. For this purpose, we develop a novel block-coordinate proximal…

Machine Learning · Computer Science 2012-06-22 Alex Bronstein , Pablo Sprechmann , Guillermo Sapiro
‹ Prev 1 2 3 10 Next ›