English
Related papers

Related papers: Minimizing Communication in Linear Algebra

200 papers

Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-24 Hussam Al Daas , Grey Ballard , Laura Grigori , Md Taufique Hussain , Suraj Kumar , Mohammad Marufur Rahman , Kathryn Rouse

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for…

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case).…

Numerical Analysis · Computer Science 2011-02-02 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Reducing communication - either between levels of a memory hierarchy or between processors over a network - is a key component of performance optimization (in both time and energy) for many problems, including dense linear algebra, particle…

Data Structures and Algorithms · Computer Science 2020-03-03 Grace Dinh , James Demmel

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending…

Numerical Analysis · Mathematics 2008-08-21 James Demmel , Laura Grigori , Mark Hoemmen , Julien Langou

The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication…

Data Structures and Algorithms · Computer Science 2011-09-12 Grey Ballard , James Demmel , Olga Holtz , Oded Schwartz

Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-03 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-27 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse

Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and…

Numerical Analysis · Mathematics 2010-11-16 Grey Ballard , James Demmel , Ioana Dumitriu

Mass spectrometry (MS) based omics data analysis require significant time and resources. To date, few parallel algorithms have been proposed for deducing peptides from mass spectrometry-based data. However, these parallel algorithms were…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-12 Fahad Saeed , Muhammad Haseeb , SS Iyengar

The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Grey Ballard , Nicholas Knight , Kathryn Rouse

We consider the problem of multiplying sparse matrices (over a semiring) where the number of non-zero entries is larger than main memory. In the classical paper of Hong and Kung (STOC '81) it was shown that to compute a product of dense $U…

Data Structures and Algorithms · Computer Science 2014-03-17 Rasmus Pagh , Morten Stöckel

We give lower bounds on the communication complexity required to solve several computational problems in a distributed-memory parallel machine, namely standard matrix multiplication, stencil computations, comparison sorting, and the Fast…

Data Structures and Algorithms · Computer Science 2013-09-24 Michele Scquizzato , Francesco Silvestri

The movement of data (communication) between levels of a memory hierarchy, or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication are of interest. Motivated by…

Classical Analysis and ODEs · Mathematics 2013-08-03 Michael Christ , James Demmel , Nicholas Knight , Thomas Scanlon , Katherine Yelick

In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Hussam Al Daas , Grey Ballard , Laura Grigori , Suraj Kumar , Kathryn Rouse , Mathieu Verite

In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-$k$ update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-22 Olivier Beaumont , Lionel Eyraud-Dubois , Mathieu Vérité , Julien Langou

A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed…

Computational Complexity · Computer Science 2019-02-07 Tyler Michael Smith , Bradley Lowery , Julien Langou , Robert A. van de Geijn

Dense linear algebra kernels, such as linear solvers or tensor contractions, are fundamental components of many scientific computing applications. In this work, we present a novel method of deriving parallel I/O lower bounds for this broad…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Grzegorz Kwasniewski , Tal Ben-Nun , Alexandros Nikolaos Ziogas , Timo Schneider , Maciej Besta , Torsten Hoefler

We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single…

In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for multiplying sparse $n \times n$ matrices, using $n$ computers. They assumed that the input matrices are uniformly sparse--there are at most $d$ non-zeros in each…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-24 Chetan Gupta , Janne H. Korhonen , Jan Studený , Jukka Suomela , Hossein Vahidi
‹ Prev 1 2 3 10 Next ›