Related papers: A Framework for Practical Parallel Fast Matrix Mul…
Fast algorithms for matrix multiplication, namely those that perform asymptotically fewer scalar operations than the classical algorithm, have been considered primarily of theoretical interest. Apart from Strassen's original algorithm, few…
Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes…
Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…
Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$…
Recently, reinforcement algorithms discovered new algorithms that really jump-started a wave of excitements and a flourishing of publications. However, there is little on implementations, applications, and, especially, no absolute…
Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…
Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$\scriptstyle…
On distributed memory electronic computers, the implementation and association of fast parallel matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the tools of molecular biology to…
A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
Matrix multiplication (GEMM) is a core operation to numerous scientific applications. Traditional implementations of Strassen-like fast matrix multiplication (FMM) algorithms often do not perform well except for very large matrix sizes, due…
It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for…
Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient…
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using…
In this paper we introduce a generic model for multiplicative algorithms which is suitable for the MapReduce parallel programming paradigm. We implement three typical machine learning algorithms to demonstrate how similarity comparison,…
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we…
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle…
Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the…
Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix…
While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical speedups. This leaves the question of if it…