English
Related papers

Related papers: Communication-Optimal Parallel Algorithm for Stras…

200 papers

A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only…

Data Structures and Algorithms · Computer Science 2012-02-16 Grey Ballard , James Demmel , Olga Holtz , Benjamin Lipshitz , Oded Schwartz

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , Grey Ballard

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication…

Data Structures and Algorithms · Computer Science 2011-09-12 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

The last decade has witnessed an explosion in the development of models, theory and computational algorithms for "big data" analysis. In particular, distributed computing has served as a natural and dominating paradigm for statistical…

Machine Learning · Statistics 2018-11-02 Bayan Saparbayeva , Michael Minyi Zhang , Lizhen Lin

The multiplication of a matrix by its transpose, $A^T A$, appears as an intermediate operation in the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm (ATA) for computing this product, based upon…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-08 Viviana Arrigoni , Filippo Maggioli , Annalisa Massini , Emanuele Rodolà

Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$\scriptstyle…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Viviana Arrigoni , Annalisa Massini

Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$…

Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-14 Yuan Tang

In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-11 Osman B. Guney , Suayb S. Arslan

Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-24 Hussam Al Daas , Grey Ballard , Laura Grigori , Md Taufique Hussain , Suraj Kumar , Mohammad Marufur Rahman , Kathryn Rouse

Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few…

Numerical Analysis · Computer Science 2016-07-26 Grey Ballard , Austin R. Benson , Alex Druinsky , Benjamin Lipshitz , Oded Schwartz

Recently, reinforcement algorithms discovered new algorithms that really jump-started a wave of excitements and a flourishing of publications. However, there is little on implementations, applications, and, especially, no absolute…

Mathematical Software · Computer Science 2023-12-21 Paolo D'Alberto

Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$…

Hardware Architecture · Computer Science 2024-06-05 Afzal Ahmad , Linfeng Du , Wei Zhang

Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-27 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Aydın Buluç , John R. Gilbert

Modern applied optimization problems become more and more complex every day. Due to this fact, distributed algorithms that can speed up the process of solving an optimization problem through parallelization are of great importance. The main…

Optimization and Control · Mathematics 2023-12-14 Svetlana Tkachenko , Artem Andreev , Aleksandr Beznosikov , Alexander Gasnikov

A tight $\Omega((n/\sqrt{M})^{\log_2 7}M)$ lower bound is derived on the \io complexity of Strassen's algorithm to multiply two $n \times n$ matrices, in a two-level storage hierarchy with $M$ words of fast memory. A proof technique is…

Data Structures and Algorithms · Computer Science 2016-05-10 Gianfranco Bilardi , Lorenzo De Stefani

Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the…

Data Structures and Algorithms · Computer Science 2024-12-20 Quentin F. Stout

In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Verite

It is well known that Strassen and Winograd algorithms can reduce the computational costs associated with dense matrix multiplication. We have already shown that they are also very effective for software-based multiple precision…

Numerical Analysis · Mathematics 2016-05-16 Tomonori Kouya
‹ Prev 1 2 3 10 Next ›