Related papers: Communication-optimal parallel and sequential QR a…
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending…
We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms…
This study focuses on the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We first introduce a new model called Hierarchical Cluster Platform (HCP),…
Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…
This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…
Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming…
We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR…
Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…
We consider the problem of computing a QR (or QZ) decomposition of a real, dense, tall and very skinny matrix. That is, the number of columns is tiny compared to the number of rows, rendering most computations completely or partially…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
In this work, we develop a new fast algorithm, spaQR -- sparsified QR, for solving large, sparse linear systems. The key to our approach is using low-rank approximations to sparsify the separators in a Nested Dissection based Householder QR…
In this work, we develop a fast hierarchical solver for solving large, sparse least squares problems. We build upon the algorithm, spaQR (sparsified QR), that was developed by the authors to solve large sparse linear systems. Our algorithm…
This work revisits existing algorithms for the QR factorization of rectangular matrices composed of p-by-q tiles, where p >= q. Within this framework, we study the critical paths and performance of algorithms such as Sameh and Kuck, Modi…
Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…
The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a…
Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In…
Factorizing large matrices by QR with column pivoting (QRCP) is substantially more expensive than QR without pivoting, owing to communication costs required for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have proven…
Communication-avoiding algorithms allow redundant computations to minimize the number of inter-process communications. In this paper, we propose to exploit this redundancy for fault-tolerance purpose. We illustrate this idea with QR…
In this work, we analyze a sublinear-time algorithm for selecting a few rows and columns of a matrix for low-rank approximation purposes. The algorithm is based on an initial uniformly random selection of rows and columns, followed by a…
The QR Decomposition (QRD) of communication channel matrices is a fundamental prerequisite to several detection schemes in Multiple-Input Multiple-Output (MIMO) communication systems. Herein, the main feature of the QRD is to transform the…