English
Related papers

Related papers: Communication-optimal parallel and sequential QR a…

200 papers

We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms…

Numerical Analysis · Mathematics 2008-09-16 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny…

Numerical Analysis · Computer Science 2008-08-29 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

This study focuses on the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We first introduce a new model called Hierarchical Cluster Platform (HCP),…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-26 Laura Grigori , Mathias Jacquelin , Amal Khabou

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Emmanuel Agullo , Camille Coti , Jack Dongarra , Thomas Herault , Julien Langou

In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional $O(n^3)$ algorithm, where the input matrices were too large to fit in the small, fast memory. In…

Computational Complexity · Computer Science 2011-09-20 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…

Numerical Analysis · Computer Science 2011-02-02 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for…

Scalable QR factorization algorithms for solving least squares and eigenvalue problems are critical given the increasing parallelism within modern machines. We introduce a more general parallelization of the CholeskyQR2 algorithm and show…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-18 Edward Hutter , Edgar Solomonik

Dense linear algebra kernels, such as linear solvers or tensor contractions, are fundamental components of many scientific computing applications. In this work, we present a novel method of deriving parallel I/O lower bounds for this broad…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Grzegorz Kwasniewski , Tal Ben-Nun , Alexandros Nikolaos Ziogas , Timo Schneider , Maciej Besta , Torsten Hoefler

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Numerical Analysis · Mathematics 2008-08-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-12 Soumyajit Chatterjee , Rahul Utkoor , Uppu Eshwar , Sathya Peri , V. Krishna Nandivada

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-15 Grey Ballard , James Demmel , Laura Grigori , Mathias Jacquelin , Nicholas Knight

Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-24 Hussam Al Daas , Grey Ballard , Laura Grigori , Md Taufique Hussain , Suraj Kumar , Mohammad Marufur Rahman , Kathryn Rouse

This paper describes a new QR factorization algorithm which is especially designed for massively parallel platforms combining parallel distributed multi-core nodes. These platforms make the present and the foreseeable future of…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-27 Jack Dongarra , Mathieu Faverge , Thomas Herault , Julien Langou , and Yves Robert

Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes…

Data Structures and Algorithms · Computer Science 2012-02-16 Grey Ballard , James Demmel , Olga Holtz , Benjamin Lipshitz , Oded Schwartz

Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-03 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

Factorizing large matrices by QR with column pivoting (QRCP) is substantially more expensive than QR without pivoting, owing to communication costs required for pivoting decisions. In contrast, randomized QRCP (RQRCP) algorithms have proven…

Numerical Analysis · Mathematics 2018-04-17 Jianwei Xiao , Ming Gu , Julien Langou

The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Grey Ballard , Nicholas Knight , Kathryn Rouse

We propose two distributed iterative algorithms that can be used to solve, in finite time, the distributed optimization problem over quadratic local cost functions in large-scale networks. The first algorithm exhibits synchronous operation…

In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Verite
‹ Prev 1 2 3 10 Next ›