Related papers: Cache-oblivious Matrix Multiplication for Exact Fa…

Comparative study of space filling curves for cache oblivious TU Decomposition

We examine several matrix layouts based on space-filling curves that allow for a cache-oblivious adaptation of parallel TU decomposition for rectangular matrices over finite fields. The TU algorithm of \cite{Dumas} requires index conversion…

Symbolic Computation · Computer Science 2016-12-20 Fatima K. Abu Salem , Mira Al Arab

Improving the Space-Time Efficiency of Processor-Oblivious Matrix Multiplication Algorithms

Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-14 Yuan Tang

Cache oblivious storage and access heuristics for blocked matrix-matrix multiplication

We investigate effects of ordering in blocked matrix--matrix multiplication. We find that submatrices do not have to be stored contiguously in memory to achieve near optimal performance. Instead it is the choice of execution order of the…

Data Structures and Algorithms · Computer Science 2008-08-15 Nicolas Bock , Emanuel H. Rubensson , Paweł Sałek , Anders M. N. Niklasson , Matt Challacombe

AQ-Stacker: An Adaptive Quantum Matrix Multiplication Algorithm with Scaling via Parallel Hadamard Stacking

Matrix multiplication (MatMul) is the computational backbone of modern machine learning, yet its classical complexity remains a bottleneck for large-scale data processing. We propose a hybrid quantum-classical algorithm for matrix…

Quantum Physics · Physics 2026-04-15 Wladimir Silva

PCOT: Cache Oblivious Tiling of Polyhedral Programs

This paper studies two variants of tiling: iteration space tiling (or loop blocking) and cache-oblivious methods that recursively split the iteration space with divide-and-conquer. The key question to answer is when we should be using one…

Programming Languages · Computer Science 2018-02-02 Waruna Ranasinghe , Nirmal Prajapati , Tomofumi Yuki , Sanjay Rajopadhye

Efficient Distributed-Memory Parallel Matrix-Vector Multiplication with Wide or Tall Unstructured Sparse Matrices

This paper presents an efficient technique for matrix-vector and vector-transpose-matrix multiplication in distributed-memory parallel computing environments, where the matrices are unstructured, sparse, and have a substantially larger…

Mathematical Software · Computer Science 2018-12-04 Jonathan Eckstein , Gyorgy Matyasfalvi

On Algorithmic Cache Optimization

We study matrix-matrix multiplication of two matrices, $A$ and $B$, each of size $n \times n$. This operation results in a matrix $C$ of size $n\times n$. Our goal is to produce $C$ as efficiently as possible given a cache: a 1-D limited…

Data Structures and Algorithms · Computer Science 2023-11-15 Neil Bhavikatti

Adaptive multiplication of rank-structured matrices in linear complexity

Hierarchical matrices approximate a given matrix by a decomposition into low-rank submatrices that can be handled efficiently in factorized form. $\mathcal{H}^2$-matrices refine this representation following the ideas of fast multipole…

Numerical Analysis · Mathematics 2024-04-24 Steffen Börm

Balanced Partitioning of Several Cache-Oblivious Algorithms

Frigo et al. proposed an ideal cache model and a recursive technique to design sequential cache-efficient algorithms in a cache-oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-04 Yuan Tang , Weiguo Gao

Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose

The multiplication of a matrix by its transpose, $A^T A$, appears as an intermediate operation in the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm (ATA) for computing this product, based upon…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-08 Viviana Arrigoni , Filippo Maggioli , Annalisa Massini , Emanuele Rodolà

A Framework for Practical Parallel Fast Matrix Multiplication

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , Grey Ballard

On the Power of Adaptivity in Matrix Completion and Approximation

We consider the related tasks of matrix completion and matrix approximation from missing data and propose adaptive sampling procedures for both problems. We show that adaptive sampling allows one to eliminate standard incoherence…

Machine Learning · Statistics 2014-07-15 Akshay Krishnamurthy , Aarti Singh

Look-ups are not (yet) all you need for deep learning inference

Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference. Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by…

Machine Learning · Computer Science 2022-07-14 Calvin McCarter , Nicholas Dronen

Efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model

In this paper we present randomized algorithms for sorting and convex hull that achieves optimal performance (for speed-up and cache misses) on the multicore model with private cache model. Our algorithms are cache oblivious and generalize…

Data Structures and Algorithms · Computer Science 2012-05-29 Neeraj Sharma , Sandeep Sen

CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design overhead. Tight coupling with the CPU…

Hardware Architecture · Computer Science 2026-04-14 Jinpeng Ye , Chongxi Wang , Wenqing Li , Bin Yuan , Shiyi Wang , Fenglu Zhang , Junyu Yue , Jianan Xie , Yunhao Ye , Haoyu Deng , Yingkun Zhou , Xin Cheng , Fuxin Zhang , Jian Wang

Error correction in fast matrix multiplication and inverse

We present new algorithms to detect and correct errors in the product of two matrices, or the inverse of a matrix, over an arbitrary field. Our algorithms do not require any additional information or encoding other than the original inputs…

Symbolic Computation · Computer Science 2018-02-08 Daniel S. Roche

Mixed precision matrix interpolative decompositions for model reduction

Renewed interest in mixed-precision algorithms has emerged due to growing data capacity and bandwidth concerns, as well as the advancement of GPUs, which enable significant speedup for low precision arithmetic. In light of this, we propose…

Numerical Analysis · Mathematics 2020-12-14 Alec Michael Dunton , Alyson Fox

Cache-Oblivious Parallel Convex Hull in the Binary Forking Model

We present two cache-oblivious sorting-based convex hull algorithms in the Binary Forking Model. The first is an algorithm for a presorted set of points which achieves $O(n)$ work, $O(\log n)$ span, and $O(n/B)$ serial cache complexity,…

Data Structures and Algorithms · Computer Science 2023-07-18 Reilly Browne , Rezaul Chowdhury , Shih-Yu Tsai , Yimin Zhu

Optimal Exact Matrix Completion Under new Parametrization

We study the problem of exact completion for $m \times n$ sized matrix of rank $r$ with the adaptive sampling method. We introduce a relation of the exact completion problem with the sparsest vector of column and row spaces (which we call…

Machine Learning · Computer Science 2022-03-08 Ilqar Ramazanli , Barnabas Poczos

Combinatorial and Recurrent Approaches for Efficient Matrix Inversion: Sub-cubic algorithms leveraging Fast Matrix products

In this paper, we introduce novel fast matrix inversion algorithms that leverage triangular decomposition and recurrent formalism, incorporating Strassen's fast matrix multiplication. Our research places particular emphasis on triangular…

Numerical Analysis · Mathematics 2026-02-05 Mohamed Kamel Riahi