Related papers: Cache oblivious storage and access heuristics for …

Memory-Usage Advantageous Block Recursive Matrix Inverse

The inversion of extremely high order matrices has been a challenging task because of the limited processing and memory capacity of conventional computers. In a scenario in which the data does not fit in memory, it is worth to consider…

Numerical Analysis · Mathematics 2018-05-08 Iria C. S. Cosme , Isaac F. Fernandes , João L. de Carvalho , Samuel Xavier-de-Souza

Cache-oblivious Matrix Multiplication for Exact Factorisation

We present a cache-oblivious adaptation of matrix multiplication to be incorporated in the parallel TU decomposition for rectangular matrices over finite fields, based on the Morton-hybrid space-filling curve representation. To realise…

Numerical Analysis · Computer Science 2017-05-16 Fatima K. Abu Salem , Mira Al Arab

Improving the Space-Time Efficiency of Processor-Oblivious Matrix Multiplication Algorithms

Classic cache-oblivious parallel matrix multiplication algorithms achieve optimality either in time or space, but not both, which promotes lots of research on the best possible balance or tradeoff of such algorithms. We study modern…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-14 Yuan Tang

Blockwise inversion and algorithms for inverting large partitioned matrices

Block matrix structure is commonly arising is various physics and engineering applications. There are various advantages in preserving the blocks structure while computing the inversion of such partitioned matrices. In this context, using…

Numerical Analysis · Mathematics 2023-11-22 R. Thiru Senthil

Memory Bounds for Concurrent Bounded Queues

Concurrent data structures often require additional memory for handling synchronization issues in addition to memory for storing elements. Depending on the amount of this additional memory, implementations can be more or less…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-17 Vitaly Aksenov , Nikita Koval , Petr Kuznetsov , Anton Paramonov

Is Sparse Matrix Reordering Effective for Sparse Matrix-Vector Multiplication?

This work evaluates the impact of sparse matrix reordering on the performance of sparse matrix-vector multiplication across different multicore CPU platforms. Reordering can significantly enhance performance by optimizing the non-zero…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-23 Omid Asudeh , Sina Mahdipour Saravani , Gerald Sabin , Fabrice Rastello , P Sadayappan

On Memory Footprints of Partitioned Sparse Matrices

Runtime characteristics of sparse matrix computations and related processes may be often improved by reducing memory footprints of involved matrices. Such a reduction can be usually achieved when matrices are processed in a block-wise…

Numerical Analysis · Computer Science 2018-01-01 Daniel Langr

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components in emerging systems. In this paper, we study the main challenges of accelerating Sparse Matrix Multiplication (SpMM). For the situations that data is not stored in the…

Hardware Architecture · Computer Science 2019-06-04 Pareesa Ameneh Golnari , Sharad Malik

Floating Point Compression of Hierarchical Matrix Formats and its Impact on Matrix-Vector Multiplication

Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices which are used to represent dense data in an optimized form by applying low-rank…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-30 Ronald Kriemann

Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors

Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we…

Numerical Analysis · Mathematics 2014-10-21 Martin D. Schatz , Tze Meng Low , Robert A. van de Geijn , Tamara G. Kolda

On efficient block Krylov-solvers for $\mathcal H^2$-matrices

Hierarchical matrices provide a highly memory-efficient way of storing dense linear operators arising, for example, from boundary element methods, particularly when stored in the H^2 format. In such data-sparse representations, iterative…

Numerical Analysis · Mathematics 2025-09-23 Sven Christophersen

Algorithms for Parallel Shared-Memory Sparse Matrix-Vector Multiplication on Unstructured Matrices

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

Performance limitations for sparse matrix-vector multiplications on current multicore environments

The increasing importance of multicore processors calls for a reevaluation of established numerical algorithms in view of their ability to profit from this new hardware concept. In order to optimize the existent algorithms, a detailed…

Performance · Computer Science 2012-03-01 Gerald Schubert , Georg Hager , Holger Fehske

BMF: Block matrix approach to factorization of large scale data

Matrix Factorization (MF) on large scale matrices is computationally as well as memory intensive task. Alternative convergence techniques are needed when the size of the input matrix is higher than the available memory on a Central…

Machine Learning · Computer Science 2019-01-21 Prasad G Bhavana , Vineet C Nair

Algorithmic Building Blocks for Asymmetric Memories

The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of energy,…

Data Structures and Algorithms · Computer Science 2018-06-28 Yan Gu , Yihan Sun , Guy E. Blelloch

On Algorithmic Cache Optimization

We study matrix-matrix multiplication of two matrices, $A$ and $B$, each of size $n \times n$. This operation results in a matrix $C$ of size $n\times n$. Our goal is to produce $C$ as efficiently as possible given a cache: a 1-D limited…

Data Structures and Algorithms · Computer Science 2023-11-15 Neil Bhavikatti

A Framework for Practical Parallel Fast Matrix Multiplication

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , Grey Ballard

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $\Omega(P^2)$ scheduling procedure used…

Numerical Analysis · Mathematics 2020-09-23 Yingzhou Li , Jack Poulson , Lexing Ying

Loading Large Sparse Matrices Stored in Files in the Adaptive-Blocking Hierarchical Storage Format

The parallel algorithm for loading large sparse matrices from files into distributed memories of high performance computing (HPC) systems is presented. This algorithm was designed specially for matrices stored in files in the space-effcient…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-30 Daniel Langr , Ivan Šimeček , Pavel Tvrdík

Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm

We propose several new schedules for Strassen-Winograd's matrix multiplication algorithm, they reduce the extra memory allocation requirements by three different means: by introducing a few pre-additions, by overwriting the input matrices,…

Mathematical Software · Computer Science 2009-05-18 Brice Boyer , Jean-Guillaume Dumas , Clément Pernet , Wei Zhou