Related papers: Matrix Multiplication Using Only Addition

Multiplying Matrices Without Multiplying

Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm…

Machine Learning · Computer Science 2021-08-17 Davis Blalock , John Guttag

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance…

Machine Learning · Computer Science 2021-10-12 Yuyang Zhang , Dik Hin Leung , Min Guo , Yijia Xiao , Haoyue Liu , Yunfei Li , Jiyuan Zhang , Guan Wang , Zhen Chen

Fair and Square: Replacing One Real Multiplication with a Single Square and One Complex Multiplication with Three Squares When Performing Matrix Multiplication and Convolutions

This paper shows that, for matrix multiplications and convolutions, it is possible to asymptotically replace each real multiplication with a single squaring operation. Similarly, a single complex multiplication can be replaced with 3…

Hardware Architecture · Computer Science 2026-03-11 Vincenzo Liguori

Accurate Models of NVIDIA Tensor Cores

Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput…

Mathematical Software · Computer Science 2026-04-07 Faizan A. Khattak , Mantas Mikaitis

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra computations. Implementing sparse matrix multiplication on an associative processor (AP) enables high level of parallelism, where a row of one matrix is multiplied in…

Mathematical Software · Computer Science 2017-05-23 L. Yavits , A. Morad , R. Ginosar

Large Scale Artificial Neural Network Training Using Multi-GPUs

This paper describes a method for accelerating large scale Artificial Neural Networks (ANN) training using multi-GPUs by reducing the forward and backward passes to matrix multiplication. We propose an out-of-core multi-GPU matrix…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-16 Linnan Wang , Wei Wu , Jianxiong Xiao , Yang Yi

A Non-Volatile All-Spin Non-Binary Matrix Multiplier: An Efficient Hardware Accelerator for Machine Learning

We propose and analyze a compact and non-volatile nanomagnetic (all-spin) non-binary matrix multiplier performing the multiply-and-accumulate (MAC) operation using two magnetic tunnel junctions - one activated by strain to act as the…

Emerging Technologies · Computer Science 2023-02-28 Rahnuma Rahman , Supriyo Bandyopadhyay

Matrix multiplication using quantum-dot cellular automata to implement conventional microelectronics

Quantum-dot cellular automata (QCA) shows promise as a post silicon CMOS, low power computational technology. Nevertheless, to generalize QCA for next-generation digital devices, the ability to implement conventional programmable circuits…

Mesoscale and Nanoscale Physics · Physics 2011-10-10 Joshua D. Wood , P. Douglas Tougaw

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional…

Mathematical Software · Computer Science 2012-05-15 Paolo D'Alberto

Fast Matrix Multiplication Without Tears: A Constraint Programming Approach

It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for…

Artificial Intelligence · Computer Science 2023-07-18 Arnaud Deza , Chang Liu , Pashootan Vaezipoor , Elias B. Khalil

A Framework for Practical Parallel Fast Matrix Multiplication

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , Grey Ballard

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

Hyper-Systolic Matrix Multiplication

A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle…

Mathematical Software · Computer Science 2007-05-23 Thomas Lippert , Nikolay Petkov , Paolo Palazzari , Klaus Schilling

Look-ups are not (yet) all you need for deep learning inference

Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference. Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by…

Machine Learning · Computer Science 2022-07-14 Calvin McCarter , Nicholas Dronen

SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Amir Ofir , Gil Ben-Artzi

Matrix Multiplication in the MPC Model

In this paper, we present algorithms to solve matrix multiplication problems in the MPC model. In particular, we consider the problem under various processor/memory constraints in the MPC model and prove the following results. 1.…

Computational Complexity · Computer Science 2025-09-30 Lakshya Joshi , Arya Deshmukh , Atharv Chhabra , Chetan Gupta

Accelerated Multiple Precision Matrix Multiplication using Strassen's Algorithm and Winograd's Variant

The Strassen algorithm and Winograd's variant accelerate matrix multiplication by using fewer arithmetic operations than standard matrix multiplication. Although many papers have been published to accelerate single- as well as…

Numerical Analysis · Mathematics 2015-10-27 Tomonori Kouya

Exploring Commutative Matrix Multiplication Schemes via Flip Graphs

We explore new approaches for finding matrix multiplication algorithms in the commutative setting by adapting the flip graph technique: a method previously shown to be effective for discovering fast algorithms in the non-commutative case.…

Symbolic Computation · Computer Science 2025-06-30 Isaac Wood

On Fast Computation of a Circulant Matrix-Vector Product

This paper deals with circulant matrices. It is shown that a circulant matrix can be multiplied by a vector in time O(n log(n)) in a ring with roots of unity without making use of an FFT algorithm. With our algorithm we achieve a speedup of…

Data Structures and Algorithms · Computer Science 2021-03-05 Andreas Rosowski