Related papers: Communication-avoiding Cholesky-QR2 for rectangula…

Shifted CholeskyQR for computing the QR factorization of ill-conditioned matrices

The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix. Unfortunately it has the inherent numerical instability and breakdown when the matrix is…

Numerical Analysis · Mathematics 2018-10-01 Takeshi Fukaya , Ramaseshan Kannan , Yuji Nakatsukasa , Yusaku Yamamoto , Yuka Yanagisawa

QR factorization of ill-conditioned tall-and-skinny matrices on distributed-memory systems

In this paper we present a novel algorithm developed for computing the QR factorisation of extremely ill-conditioned tall-and-skinny matrices on distributed memory systems. The algorithm is based on the communication-avoiding CholeskyQR2…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Nenad Mijić , Abhiram Kaushik , Davor Davidović

Implementing Communication-Optimal Parallel and Sequential QR Factorizations

We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms…

Numerical Analysis · Mathematics 2008-09-16 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-26 Grzegorz Kwasniewski , Marko Kabić , Tal Ben-Nun , Alexandros Nikolaos Ziogas , Jens Eirik Saethre , André Gaillard , Timo Schneider , Maciej Besta , Anton Kozhevnikov , Joost VandeVondele , Torsten Hoefler

Analysis of Randomized Householder-Cholesky QR Factorization with Multisketching

CholeskyQR2 and shifted CholeskyQR3 are two state-of-the-art algorithms for computing tall-and-skinny QR factorizations since they attain high performance on current computer architectures. However, to guarantee stability, for some…

Numerical Analysis · Mathematics 2025-09-17 Andrew J. Higgins , Daniel B. Szyld , Erik G. Boman , Ichitaro Yamazaki

Randomized Cholesky QR factorizations

This article proposes and analyzes several variants of the randomized Cholesky QR factorization of a matrix $X$. Instead of computing the R factor from $X^T X$, as is done by standard methods, we obtain it from a small, efficiently…

Numerical Analysis · Mathematics 2022-10-25 Oleg Balabanov

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Emmanuel Agullo , Camille Coti , Jack Dongarra , Thomas Herault , Julien Langou

Communication-optimal parallel and sequential QR and LU factorizations

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending…

Numerical Analysis · Mathematics 2008-08-21 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny…

Numerical Analysis · Computer Science 2008-08-29 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Communication-optimal Parallel and Sequential Cholesky Decomposition

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…

Numerical Analysis · Computer Science 2011-02-02 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Efficient Task Graph Scheduling for Parallel QR Factorization in SLSQP

Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-12 Soumyajit Chatterjee , Rahul Utkoor , Uppu Eshwar , Sathya Peri , V. Krishna Nandivada

Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block…

Mathematical Software · Computer Science 2016-01-26 Kyungjoo Kim , Sivasankaran Rajamanickam , George Stelle , H. Carter Edwards , Stephen L. Olivier

Analysis of randomized CholeskyQR for sparse matrices

This work is about rounding error analysis of randomized CholeskyQR-type algorithms for sparse matrices. We often encounter QR factorization of the sparse matrices in many real problems. In this work, we focus on some typical…

Numerical Analysis · Mathematics 2025-11-10 Haoran Guan , Yuwei Fan

CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)

This paper develops and analyzes a new algorithm for QR decomposition with column pivoting (QRCP) of rectangular matrices with many more rows than columns. The algorithm carefully combines methods from randomized numerical linear algebra to…

Numerical Analysis · Mathematics 2025-03-18 Maksim Melnichenko , Oleg Balabanov , Riley Murray , James Demmel , Michael W. Mahoney , Piotr Luszczek

Implementation of QR factorization of tall and very skinny matrices on current GPUs

We consider the problem of computing a QR (or QZ) decomposition of a real, dense, tall and very skinny matrix. That is, the number of columns is tiny compared to the number of rows, rendering most computations completely or partially…

Mathematical Software · Computer Science 2026-03-24 Jonas Thies , Melven Röhrig-Zöllner

Parallel Tiled QR Factorization for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Numerical Analysis · Mathematics 2008-08-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Fast Parallel Randomized QR with Column Pivoting Algorithms for Reliable Low-rank Matrix Approximations

Factorizing large matrices by QR with column pivoting (QRCP) is substantially more expensive than QR without pivoting, owing to communication costs required for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have proven…

Numerical Analysis · Mathematics 2018-04-17 Jianwei Xiao , Ming Gu , Julien Langou

A 3D Parallel Algorithm for QR Decomposition

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Grey Ballard , James Demmel , Laura Grigori , Mathias Jacquelin , Nicholas Knight

Randomized QR with Column Pivoting

The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions. We use randomized sampling to approximate this process…

Numerical Analysis · Mathematics 2018-01-23 Jed A. Duersch , Ming Gu