English
Related papers

Related papers: Extending Sparse Tensor Accelerators to Support Mu…

200 papers

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often…

Hardware Architecture · Computer Science 2026-03-13 Junyi Wu , Chao Fang , Zhongfeng Wang

Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they…

Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse…

Hardware Architecture · Computer Science 2023-05-12 Bahar Asgari , Ramyad Hadidi , Joshua Dierberger , Charlotte Steinichen , Amaan Marfatia , Hyesoon Kim

Tensors play a vital role in machine learning (ML) and often exhibit properties best explored while maintaining high-order. Efficiently performing ML computations requires taking advantage of sparsity, but generalized hardware support is…

Hardware Architecture · Computer Science 2024-04-26 Gabriel Kulp , Andrew Ensinger , Lizhong Chen

High-energy, large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates, reaching up to $1$ terabyte and several petabytes per second, respectively. The development of real-time, high-throughput…

Artificial Intelligence · Computer Science 2024-12-03 Xihaier Luo , Samuel Lurvey , Yi Huang , Yihui Ren , Jin Huang , Byung-Jun Yoon

Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…

Machine Learning · Computer Science 2025-05-27 Geonhwa Jeong , Po-An Tsai , Abhimanyu R. Bambhaniya , Stephen W. Keckler , Tushar Krishna

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating…

Machine Learning · Computer Science 2023-02-22 Zihao Ye , Ruihang Lai , Junru Shao , Tianqi Chen , Luis Ceze

Long contexts improve capabilities of large language models but pose serious hardware challenges: compute and memory footprints grow linearly with sequence length. Particularly, the decoding phase continuously accesses massive KV cache,…

Hardware Architecture · Computer Science 2026-04-29 Wang Fan , Wei Cao , Xi Zha , Kedi Ma , MingQian Sun , Jialin Chen , Fengzhe Zhang , Fan Zhang

Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra…

Programming Languages · Computer Science 2024-07-19 Mahdi Ghorbani , Emilien Bauer , Tobias Grosser , Amir Shaikhha

High-dimensional data has become ubiquitous across the sciences but presents computational and statistical challenges. A common approach to addressing these challenges is through sparsity. In this paper, we introduce a new concept of…

Statistics Theory · Mathematics 2025-09-03 Ali Mohades , Johannes Lederer

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training…

Hardware Architecture · Computer Science 2022-03-28 Mostafa Mahmoud , Isak Edo , Ali Hadi Zadeh , Omar Mohamed Awad , Gennady Pekhimenko , Jorge Albericio , Andreas Moshovos

Neural networks have proven to be extremely powerful tools for modern artificial intelligence applications, but computational and storage complexity remain limiting factors. This paper presents two compatible contributions towards reducing…

Machine Learning · Computer Science 2024-10-30 Sourya Dey , Kuan-Wen Huang , Peter A. Beerel , Keith M. Chugg

Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the…

Databases · Computer Science 2022-10-13 Maximilian Schleich , Amir Shaikhha , Dan Suciu

New directions in computing and algorithms has lead to some new applications that have tolerance to imprecision. Although, These applications are creating large volumes of data which exceeds the capability of today's computing systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-16 Navid Mirnouri

Deep learning accelerators efficiently train over vast and growing amounts of data, placing a newfound burden on commodity networks and storage devices. A common approach to conserve bandwidth involves resizing or compressing data prior to…

Machine Learning · Computer Science 2021-08-13 Michael Kuchnik , George Amvrosiadis , Virginia Smith

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing,…

Hardware Architecture · Computer Science 2017-01-25 Sang-Woo Jun , Huy T. Nguyen , Vijay N. Gadepally , Arvind

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu
‹ Prev 1 2 3 10 Next ›