Related papers: Implementing Communication-Optimal Parallel and Se…

Communication-optimal parallel and sequential QR and LU factorizations

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending…

Numerical Analysis · Mathematics 2008-08-21 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Emmanuel Agullo , Camille Coti , Jack Dongarra , Thomas Herault , Julien Langou

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny…

Numerical Analysis · Computer Science 2008-08-29 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

Communication-avoiding Cholesky-QR2 for rectangular matrices

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-18 Edward Hutter , Edgar Solomonik

Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Low-rank Matrix Approximations

Factorizing large matrices by QR with column pivoting (QRCP) is substantially more expensive than QR without pivoting, owing to communication costs required for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have proven…

Numerical Analysis · Mathematics 2018-04-17 Jianwei Xiao , Ming Gu , Julien Langou

Parallel Tiled QR Factorization for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Numerical Analysis · Mathematics 2008-08-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Efficient Task Graph Scheduling for Parallel QR Factorization in SLSQP

Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-12 Soumyajit Chatterjee , Rahul Utkoor , Uppu Eshwar , Sathya Peri , V. Krishna Nandivada

Hierarchical QR factorization algorithms for multi-core cluster systems

This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-27 Jack Dongarra , Mathieu Faverge , Thomas Herault , Julien Langou , and Yves Robert

QR factorization of ill-conditioned tall-and-skinny matrices on distributed-memory systems

In this paper we present a novel algorithm developed for computing the QR factorisation of extremely ill-conditioned tall-and-skinny matrices on distributed memory systems. The algorithm is based on the communication-avoiding CholeskyQR2…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Nenad Mijić , Abhiram Kaushik , Davor Davidović

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Austin R. Benson , David F. Gleich , James Demmel

Implementation of QR factorization of tall and very skinny matrices on current GPUs

We consider the problem of computing a QR (or QZ) decomposition of a real, dense, tall and very skinny matrix. That is, the number of columns is tiny compared to the number of rows, rendering most computations completely or partially…

Mathematical Software · Computer Science 2026-03-24 Jonas Thies , Melven Röhrig-Zöllner

Fast multiplication of random dense matrices with fixed sparse matrices

This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation…

Computational Engineering, Finance, and Science · Computer Science 2024-05-14 Tianyu Liang , Riley Murray , Aydın Buluç , James Demmel

Parallel QR Factorization of Block Low-Rank Matrices

We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR…

Numerical Analysis · Mathematics 2022-08-15 M. Ridwan Apriansyah , Rio Yokota

Randomized QR with Column Pivoting

The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions. We use randomized sampling to approximate this process…

Numerical Analysis · Mathematics 2018-01-23 Jed A. Duersch , Ming Gu

Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes…

Data Structures and Algorithms · Computer Science 2012-02-16 Grey Ballard , James Demmel , Olga Holtz , Benjamin Lipshitz , Oded Schwartz

Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures

The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…

Mathematical Software · Computer Science 2010-02-23 Emmanuel Agullo , Henricus Bouwmeester , Jack Dongarra , Jakub Kurzak , Julien Langou , Lee Rosenberg

Highly Parallel Sparse Matrix-Matrix Multiplication

Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Aydın Buluç , John R. Gilbert

A 3D Parallel Algorithm for QR Decomposition

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Grey Ballard , James Demmel , Laura Grigori , Mathias Jacquelin , Nicholas Knight

Multilevel communication optimal LU and QR factorizations for hierarchical platforms

This study focuses on the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We first introduce a new model called Hierarchical Cluster Platform (HCP),…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-26 Laura Grigori , Mathias Jacquelin , Amal Khabou

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra