Related papers: A method for accelerating low precision operations…

Efficient Matrix Multiplication: The Sparse Power-of-2 Factorization

We present an algorithm to reduce the computational effort for the multiplication of a given matrix with an unknown column vector. The algorithm decomposes the given matrix into a product of matrices whose entries are either zero or integer…

Information Theory · Computer Science 2020-02-28 Ralf R. Müller , Bernhard Gäde , Ali Bereyhi

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…

Hardware Architecture · Computer Science 2021-08-11 Shail Dave , Riyadh Baghdadi , Tony Nowatzki , Sasikanth Avancha , Aviral Shrivastava , Baoxin Li

Estimating Multiple Precision Matrices with Cluster Fusion Regularization

We propose a penalized likelihood framework for estimating multiple precision matrices from different classes. Most existing methods either incorporate no information on relationships between the precision matrices, or require this…

Machine Learning · Statistics 2020-03-03 Bradley S. Price , Aaron J. Molstad , Ben Sherwood

A sparse-sampling approach for the fast computation of matrices: application to molecular vibrations

This article presents a new method to compute matrices from numerical simulations based on the ideas of sparse sampling and compressed sensing. The method is useful for problems where the determination of the entries of a matrix constitutes…

Chemical Physics · Physics 2014-10-21 Jacob N. Sanders , Xavier Andrade , Alán Aspuru-Guzik

Efficient Quantized Sparse Matrix Operations on Tensor Cores

The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Shigang Li , Kazuki Osawa , Torsten Hoefler

Fast multiplication of random dense matrices with fixed sparse matrices

This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation…

Computational Engineering, Finance, and Science · Computer Science 2024-05-14 Tianyu Liang , Riley Murray , Aydın Buluç , James Demmel

Sparse approximate matrix-matrix multiplication for density matrix purification with error control

We propose a method for strict error control in sparse approximate matrix-matrix multiplication. The method combines an error bound and a parameter sweep to select an appropriate threshold value. The scheme for error control and the sparse…

Numerical Analysis · Mathematics 2021-06-02 Anton G. Artemov , Emanuel H. Rubensson

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra computations. Implementing sparse matrix multiplication on an associative processor (AP) enables high level of parallelism, where a row of one matrix is multiplied in…

Mathematical Software · Computer Science 2017-05-23 L. Yavits , A. Morad , R. Ginosar

High-Accuracy Low-Precision Training

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous…

Machine Learning · Computer Science 2018-03-12 Christopher De Sa , Megan Leszczynski , Jian Zhang , Alana Marzoev , Christopher R. Aberger , Kunle Olukotun , Christopher Ré

Estimating sparse precision matrices

We apply a method recently introduced to the statistical literature to directly estimate the precision matrix from an ensemble of samples drawn from a corresponding Gaussian distribution. Motivated by the observation that cosmological…

Instrumentation and Methods for Astrophysics · Physics 2016-05-25 Nikhil Padmanabhan , Martin White , Harrison H. Zhou , Ross O'Connell

High Accuracy Low Precision QR Factorization and Least Square Solver on GPU with TensorCore

Driven by the insatiable needs to process ever larger amount of data with more complex models, modern computer processors and accelerators are beginning to offer half precision floating point arithmetic support, and extremely optimized…

Mathematical Software · Computer Science 2019-12-12 Shaoshuai Zhang , Panruo Wu

Error correction in fast matrix multiplication and inverse

We present new algorithms to detect and correct errors in the product of two matrices, or the inverse of a matrix, over an arbitrary field. Our algorithms do not require any additional information or encoding other than the original inputs…

Symbolic Computation · Computer Science 2018-02-08 Daniel S. Roche

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs

In recent years, a new kind of accelerated hardware has gained popularity in the Artificial Intelligence (AI) and Machine Learning (ML) communities which enables extremely high-performance tensor contractions in reduced precision for deep…

Computational Physics · Physics 2024-05-01 Adela Habib , Joshua Finkelstein , Anders M. N. Niklasson

The Sparse Reverse of Principal Component Analysis for Fast Low-Rank Matrix Completion

Matrix completion constantly receives tremendous attention from many research fields. It is commonly applied for recommender systems such as movie ratings, computer vision such as image reconstruction or completion, multi-task learning such…

Machine Learning · Computer Science 2019-10-08 Abdallah Chehade , Zunya Shi

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu

Sparse Matrix Multiplication on CAM Based Accelerator

Sparse matrix multiplication is an important component of linear algebra computations. In this paper, an architecture based on Content Addressable Memory (CAM) and Resistive Content Addressable Memory (ReCAM) is proposed for accelerating…

Hardware Architecture · Computer Science 2017-05-30 Leonid Yavits , Ran Ginosar

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero…

Machine Learning · Computer Science 2021-04-20 Asit Mishra , Jorge Albericio Latorre , Jeff Pool , Darko Stosic , Dusan Stosic , Ganesh Venkatesh , Chong Yu , Paulius Micikevicius

Recovery of Sparse and Low Rank Components of Matrices Using Iterative Method with Adaptive Thresholding

In this letter, we propose an algorithm for recovery of sparse and low rank components of matrices using an iterative method with adaptive thresholding. In each iteration, the low rank and sparse components are obtained using a thresholding…

Numerical Analysis · Computer Science 2017-04-13 Nematollah Zarmehi , Farokh Marvasti

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu