Related papers: On the Mesh Array for Matrix Multiplication
This note looks at the efficiency of the cross-wired mesh array in the context of matrix multiplication. It is shown that in case of repeated operations, the average number of steps to multiply sets of nxn matrices on a 2D cross-wired mesh…
Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the…
This paper presents new results on randomization using Kak's Mesh Array for matrix multiplication. These results include the periods of the longest cycles when the the array is used for scrambling and the autocorrelation function of the…
It is widely known that the lower bound for the algorithmic complexity of square matrix multiplication resorts to at least $n^2$ arithmetic operations. The justification builds upon the following reasoning: given that there are $2 n^2$…
Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…
Matrix multiplication consumes a large fraction of the time taken in many machine-learning algorithms. Thus, accelerator chips that perform matrix multiplication faster than conventional processors or even GPU's are of increasing interest.…
This paper shows that, for matrix multiplications and convolutions, it is possible to asymptotically replace each real multiplication with a single squaring operation. Similarly, a single complex multiplication can be replaced with 3…
Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This…
Matrix multiplication is the foundation from much of the success from high performance technologies like deep learning, scientific simulations, and video graphics. High level programming languages like Python and R rely on highly optimized…
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle…
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm…
While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical speedups. This leaves the question of if it…
We present an approximate algorithm for matrix multiplication based on matrix sketching techniques. First one of the matrix is chosen and sparsified using the online matrix sketching algorithm, and then the matrix product is calculated…
The flip graph algorithm is a method for discovering new matrix multiplication schemes by following random walks on a graph. We introduce a version of the flip graph algorithm for matrix multiplication schemes that admit certain symmetries.…
It is known since the 1970s that no more than 23 multiplications are required for computing the product of two 3 x 3-matrices. It is not known whether this can also be done with fewer multiplications. However, there are several mutually…
Systolic arrays have proven to be highly efficient for parallelized matrix-matrix multiplication (MMM), utilizing synchronized, heartbeat-like data flows across an array of processing elements. While optical structures such as waveguide…
In the Python world, NumPy arrays are the standard representation for numerical data. Here, we show how these arrays enable efficient implementation of numerical computations in a high-level language. Overall, three techniques are applied…
Schemes for exact multiplication of small matrices have a large symmetry group. This group defines an equivalence relation on the set of multiplication schemes. There are algorithms to decide whether two schemes are equivalent. However, for…
It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for…
The multiplication of matrices is an important arithmetic operation in computational mathematics. In the context of hierarchical matrices, this operation can be realized by the multiplication of structured block-wise low-rank matrices,…