English
Related papers

Related papers: Strassen Multisystolic Array Hardware Architecture…

200 papers

Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$…

Hardware Architecture · Computer Science 2024-06-05 Afzal Ahmad , Linfeng Du , Wei Zhang

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , Grey Ballard

In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-11 Osman B. Guney , Suayb S. Arslan

The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays)…

Hardware Architecture · Computer Science 2024-02-02 Ruiqi Sun , Yinchen Ni , Xin He , Jie Zhao , An Zou

Recently, reinforcement algorithms discovered new algorithms that really jump-started a wave of excitements and a flourishing of publications. However, there is little on implementations, applications, and, especially, no absolute…

Mathematical Software · Computer Science 2023-12-21 Paolo D'Alberto

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This…

Hardware Architecture · Computer Science 2018-03-13 Junzhong Shen , Yuran Qiao , You Huang , Mei Wen , Chunyuan Zhang

Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few…

Numerical Analysis · Computer Science 2016-07-26 Grey Ballard , Austin R. Benson , Alex Druinsky , Benjamin Lipshitz , Oded Schwartz

In this paper, we consider the HLS implementation of a three-dimensional systolic array architecture for matrix multiplication that targets specific characteristics of Intel Stratix 10 FPGAs in order to produce designs that achieve a high…

Hardware Architecture · Computer Science 2021-10-25 Paolo Gorlani , Christian Plessl

While the Karatsuba algorithm reduces the complexity of large integer multiplication, the extra additions required minimize its benefits for smaller integers of more commonly-used bitwidths. In this work, we propose the extension of the…

Hardware Architecture · Computer Science 2025-01-16 Trevor E. Pogue , Nicola Nicolici

This paper presents a new fast, highly scalable distributed matrix multiplication algorithm on Apache Spark, called Stark, based on Strassen's matrix multiplication algorithm. Stark preserves Strassen's 7 multiplications scheme in a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-26 Chandan Misra , Sourangshu Bhattacharya , Soumya K. Ghosh

After Strassen presented the first sub-cubic matrix multiplication algorithm, many Strassen-like algorithms are presented. Most of them with low asymptotic cost have large hidden leading coefficient which are thus impractical. To reduce the…

Symbolic Computation · Computer Science 2022-03-31 Pu Wu , Huiqing Jiang , Zehui Shao , Jin Xu

A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of…

Machine Learning · Computer Science 2018-06-11 Michael Tschannen , Aran Khanna , Anima Anandkumar

It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for…

Artificial Intelligence · Computer Science 2023-07-18 Arnaud Deza , Chang Liu , Pashootan Vaezipoor , Elias B. Khalil

We dispel with "street wisdom" regarding the practical implementation of Strassen's algorithm for matrix-matrix multiplication (DGEMM). Conventional wisdom: it is only practical for very large matrices. Our implementation is practical for…

Mathematical Software · Computer Science 2016-05-05 Jianyu Huang , Tyler M. Smith , Greg M. Henry , Robert A. van de Geijn

Modern deep learning models have high memory and computation cost. To make them fast and memory-cost efficient, structured model pruning is commonly used. We find that pruning a model using a common training accelerator with large systolic…

Machine Learning · Computer Science 2020-04-29 Sangkug Lym , Mattan Erez

The Strassen algorithm and Winograd's variant accelerate matrix multiplication by using fewer arithmetic operations than standard matrix multiplication. Although many papers have been published to accelerate single- as well as…

Numerical Analysis · Mathematics 2015-10-27 Tomonori Kouya

FPGA architectures have recently been enhanced to meet the substantial computational demands of modern deep neural networks (DNNs). To this end, both FPGA vendors and academic researchers have proposed in-fabric blocks that perform…

Hardware Architecture · Computer Science 2025-02-07 Endri Taka , Ning-Chi Huang , Chi-Chih Chang , Kai-Chiang Wu , Aman Arora , Diana Marculescu

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes…

Data Structures and Algorithms · Computer Science 2012-02-16 Grey Ballard , James Demmel , Olga Holtz , Benjamin Lipshitz , Oded Schwartz

A tight $\Omega((n/\sqrt{M})^{\log_2 7}M)$ lower bound is derived on the \io complexity of Strassen's algorithm to multiply two $n \times n$ matrices, in a two-level storage hierarchy with $M$ words of fast memory. A proof technique is…

Data Structures and Algorithms · Computer Science 2016-05-10 Gianfranco Bilardi , Lorenzo De Stefani

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-10 Mehmet Deveci , Christian Trott , Sivasankaran Rajamanickam
‹ Prev 1 2 3 10 Next ›