Related papers: Communication-Optimal Parallel Algorithm for Stras…
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only…
Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…
The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication…
The last decade has witnessed an explosion in the development of models, theory and computational algorithms for "big data" analysis. In particular, distributed computing has served as a natural and dominating paradigm for statistical…
The multiplication of a matrix by its transpose, $A^T A$, appears as an intermediate operation in the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm (ATA) for computing this product, based upon…
Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$\scriptstyle…
Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for $2\times 2$…
Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using…
Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few…
Recently, reinforcement algorithms discovered new algorithms that really jump-started a wave of excitements and a flourishing of publications. However, there is little on implementations, applications, and, especially, no absolute…
Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$…
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has…
Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…
Modern applied optimization problems become more and more complex every day. Due to this fact, distributed algorithms that can speed up the process of solving an optimization problem through parallelization are of great importance. The main…
A tight $\Omega((n/\sqrt{M})^{\log_2 7}M)$ lower bound is derived on the \io complexity of Strassen's algorithm to multiply two $n \times n$ matrices, in a two-level storage hierarchy with $M$ words of fast memory. A proof technique is…
Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the…
In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…
It is well known that Strassen and Winograd algorithms can reduce the computational costs associated with dense matrix multiplication. We have already shown that they are also very effective for software-based multiple precision…