Related papers: Parallel QR Factorization of Block Low-Rank Matric…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny…
The unpivoted and pivoted Householder QR factorizations are ubiquitous in numerical linear algebra. A difficulty with pivoted Householder QR is the communication bottleneck introduced by pivoting. In this paper we propose using random…
In this work, we develop a new fast algorithm, spaQR -- sparsified QR, for solving large, sparse linear systems. The key to our approach is using low-rank approximations to sparsify the separators in a Nested Dissection based Householder QR…
Recent advances in transformer-based foundation models have made them the default choice for many tasks, but their rapidly growing size makes fitting a full model on a single GPU increasingly difficult and their computational cost…
A fundamental problem when adding column pivoting to the Householder QR factorization is that only about half of the computation can be cast in terms of high performing matrix-matrix multiplications, which greatly limits the benefits that…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming…
The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions. We use randomized sampling to approximate this process…
We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms…
The current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core…
In this paper, a hierarchical Tucker low-rank (HTLR) matrix is proposed to approximate non-oscillatory kernel functions in linear complexity. The HTLR matrix is based on the hierarchical matrix, with the low-rank blocks replaced by Tucker…
In this work, we present randomized compression algorithms for flat rank-structured matrices with shared bases, termed uniform Block Low-Rank (BLR) matrices. Our main contribution is a technique called tagging, which improves upon the…
We consider the problem of computing a QR (or QZ) decomposition of a real, dense, tall and very skinny matrix. That is, the number of columns is tiny compared to the number of rows, rendering most computations completely or partially…
The efficient and accurate QR decomposition for matrices with hierarchical low-rank structures, such as HODLR and hierarchical matrices, has been challenging. Existing structure-exploiting algorithms are prone to numerical instability as…
In this work, we develop a fast hierarchical solver for solving large, sparse least squares problems. We build upon the algorithm, spaQR (sparsified QR), that was developed by the authors to solve large sparse linear systems. Our algorithm…
Factorizing large matrices by QR with column pivoting (QRCP) is substantially more expensive than QR without pivoting, owing to communication costs required for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have proven…
Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…
This paper proposes a scalable binary CUR low-rank approximation algorithm that leverages parallel selection of representative rows and columns within a deterministic framework. By employing a blockwise adaptive cross approximation…