Related papers: High Performance Matrix Multiplication

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

Matrix multiplication is fundamental in the backpropagation algorithm used to train deep neural network models. Libraries like Intel's MKL or NVIDIA's cuBLAS implemented new and optimized matrix multiplication techniques that increase…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-28 L. A. Torres , Carlos J. Barrios H , Yves Denneulin

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries to achieve peak performance on commodity hardware. High-performance BLAS…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-30 Braedy Kuzma , Ivan Korostelev , João P. L. de Carvalho , José E. Moreira , Christopher Barton , Guido Araujo , José Nelson Amaral

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

KBLAS is a new open source high performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of…

Mathematical Software · Computer Science 2014-10-08 Ahmad Abdelfattah , David Keyes , Hatem Ltaief

Multiplica\c{c}\~ao de matrizes: uma compara\c{c}\~ao entre as abordagens sequencial (CPU) e paralela (GPU)

Designing problems using matrices is very important in Computer Science. Fields like graph computer, graphs theory, and machine learning use matrices very often to solve their own problems. The most often matrix operation is the…

Performance · Computer Science 2019-05-10 Andre G. C. Pacheco

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional…

Mathematical Software · Computer Science 2012-05-15 Paolo D'Alberto

Accelerating R with high performance linear algebra libraries

Linear algebra routines are basic building blocks for the statistical software. In this paper we analyzed how can we can improve R performance for matrix computations. We benchmarked few matrix operations using the standard linear algebra…

Mathematical Software · Computer Science 2018-03-21 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

A Framework for Practical Parallel Fast Matrix Multiplication

Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , Grey Ballard

Performance Engineering for Real and Complex Tall & Skinny Matrix Multiplication Kernels on GPUs

General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which…

Mathematical Software · Computer Science 2020-06-25 Dominik Ernst , Georg Hager , Jonas Thies , Gerhard Wellein

Reliable Generation of High-Performance Matrix Algebra

Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in…

Mathematical Software · Computer Science 2012-05-09 Geoffrey Belter , Elizabeth Jessup , Thomas Nelson , Boyana Norris , Jeremy G. Siek

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance…

Machine Learning · Computer Science 2021-10-12 Yuyang Zhang , Dik Hin Leung , Min Guo , Yijia Xiao , Haoyue Liu , Yunfei Li , Jiyuan Zhang , Guan Wang , Zhen Chen

Adaptive multiplication of rank-structured matrices in linear complexity

Hierarchical matrices approximate a given matrix by a decomposition into low-rank submatrices that can be handled efficiently in factorized form. $\mathcal{H}^2$-matrices refine this representation following the ideas of fast multipole…

Numerical Analysis · Mathematics 2024-04-24 Steffen Börm

Tuning Technique for Multiple Precision Dense Matrix Multiplication using Prediction of Computational Time

Although reliable long precision floating-point arithmetic libraries such as QD and MPFR/GMP are necessary to solve ill-conditioned problems in numerical simulation, long precision BLAS-level computation such as matrix multiplication has…

Mathematical Software · Computer Science 2017-10-06 Tomonori Kouya

Highly Parallel Sparse Matrix-Matrix Multiplication

Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Aydın Buluç , John R. Gilbert

High-Performance Level-1 and Level-2 BLAS

The introduction of the Basic Linear Algebra Subroutine (BLAS) in the 1970s paved the way for different libraries to solve the same problem with an improved approach and hardware. The new BLAS implementation led to High-Performance…

Mathematical Software · Computer Science 2021-08-05 Amit Singh , Cem Bassoy

Analysis of the Performance of the Matrix Multiplication Algorithm on the Cirrus Supercomputer

Matrix multiplication is integral to various scientific and engineering disciplines, including machine learning, image processing, and gaming. With the increasing data volumes in areas like machine learning, the demand for efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-29 Temitayo Adefemi

Accurate Models of NVIDIA Tensor Cores

Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput…

Mathematical Software · Computer Science 2026-04-07 Faizan A. Khattak , Mantas Mikaitis

Fast Matrix Multiplication Without Tears: A Constraint Programming Approach

It is known that the multiplication of an $N \times M$ matrix with an $M \times P$ matrix can be performed using fewer multiplications than what the naive $NMP$ approach suggests. The most famous instance of this is Strassen's algorithm for…

Artificial Intelligence · Computer Science 2023-07-18 Arnaud Deza , Chang Liu , Pashootan Vaezipoor , Elias B. Khalil

Matrix Multiplication, Trilinear Decompositions, APA Algorithms, and Summation

Matrix multiplication (hereafter we use the acronym MM) is among the most fundamental operations of modern computations. The efficiency of its performance depends on various factors, in particular vectorization, data movement and arithmetic…

Data Structures and Algorithms · Computer Science 2015-02-09 Victor Y. Pan

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra computations. Implementing sparse matrix multiplication on an associative processor (AP) enables high level of parallelism, where a row of one matrix is multiplied in…

Mathematical Software · Computer Science 2017-05-23 L. Yavits , A. Morad , R. Ginosar